MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.32k stars 652 forks source link

Have LinearDensity support updating AtomGroups, more intuitive names, save bins #2508

Open lilyminium opened 4 years ago

lilyminium commented 4 years ago

Is your feature request related to a problem? Please describe.

  1. This mailing list issue is a good use case of Linear Density but is foiled by the class not supporting UpdatingAtomGroups.

  2. It is not intuitive to me that LinearDensity.results['x']['pos'] refers to the mass-weighted density in g/cm^3 of the selection. pos makes me think of coordinates. char makes me think of characters.

  3. If the class saves the bins from np.histogram, it would be far easier to take advantage of existing histogram plotting tools. Also, using np.histogramdd seems easier.

  4. The output is just an array of floats and the documentation doesn't say which units the density is in. These are not the normal units of MDAnalysis (amu for mass, Angstrom for length, e for charge) so they need special documentation. Also, the units written in the header of the file output are inaccurate for the charge. See #2507 for more

Describe the solution you'd like

  1. Move the code below to _single_frame() so it updates.

        # Get masses and charges for the selection
        try:  # in case it's not an atom
            self.masses = np.array([elem.total_mass() for elem in group])
            self.charges = np.array([elem.total_charge() for elem in group])
        except AttributeError:  # much much faster for atoms
            self.masses = self._ags[0].masses
            self.charges = self._ags[0].charges
    
        self.totalmass = np.sum(self.masses)
  2. Call them .mass_density and .charge_density in their own NumPy arrays. Have the results as Numpy arrays instead of dictionaries.

  3. Save the bins

  4. Not change the units.

Describe alternatives you've considered A clear and concise description of any alternative solutions or features you've considered.

Additional context Happy to do this at some point, if people are ok with breaking scripts that have coded the original .results['x']['pos'] indexing into them

orbeckst commented 4 years ago

This all sounds extremely sensible to me.

For 1.0 we could break the API but if there's a way to keep the old data structures around (maybe hide them behind managed attributes that create them on the fly from the new real data structures) then that would help users who just want their old stuff to still work with the last py2 release but set us up for a clean removal for 2.0.

With properties for the old attributes we can also easily issue deprecation warnings.

hmacdope commented 4 years ago

Hey all, I would love to work on this if possible?

lilyminium commented 4 years ago

@hmacdope please go ahead! Here is a notebook that shows a bit of the current behaviour of the class. #2507 has some notes about the units of the output.

PicoCentauri commented 2 years ago

I reopen since we still have to discuss point 4. My feeling is that we decided to keep everything in MDAnalysis' standard units and do no conversion at all. Are there other opinions?

IAlibay commented 2 years ago

I'm definitely for having everything in MDA units, however it should be documented in the docstring if it's not already.

BFedder commented 2 years ago

I'll sum up what has been discussed so far here to facilitate the discussion.

Current behaviour as of 2.2.0 is that mass densities are returned in units of g/cm^3, charge densities as (e * mol) / cm^3. This is documented in the docstring. These units are obtained by taking the raw masses in amu and charges in e, dividing by the slice volume in A^3 for initial densities, and applying a factor of Avogadro's number and the volume conversion from A^3 to cm^3 as follows:

def _conclude(self):
    avogadro = constants["N_Avogadro"]  # unit: mol^{-1}
    volume_conversion = 1e-24  # unit: cm^3/A^3        <-- I just noticed the units are flipped in the code currently, made a mistake there, sorry! This here is how it should be. The numbers are correct though.
    # divide result values by avogadro and convert from A3 to cm3
    k = avogadro * volume_conversion
    ...
    for dim in ['x', 'y', 'z']:
        # norming factor, units of mol^-1 cm^3
        norm = k * self.results[dim]['slice_volume']  # slice volume in A^3
        for key in self.keys:
            self.results[dim][key] /= norm

@orbeckst said in https://github.com/MDAnalysis/mdanalysis/issues/2507#issuecomment-583634608 that in his opinion, the current units are sensible for mass densities, but charge densities should be in MDA standard units of e / A^3. This can be achieved here by simply not applying the factor of k for charge densities as such:

for dim in ['x', 'y', 'z']:
    self.results[dim][key] /= self.results[dim]['slice_volume']  # raw densities, amu/A^3 and e/A^3
    for key in ["mass_density", "mass_density_stddev"]:  # convert to g/cm^3
        self.results[dim][key] /= k  

If the mass densities should be in MDA standard units of amu/A^3 as well (as @PicoCentauri, @IAlibay and I believe @lilyminium suggest), then the division by k can just be omitted.

For comparison, MDAnalysis.analysis.density returns densities with default units of A^(-3) but offers methods for converting to other units of length and density.

Once the decision of which units should be returned is made, the change (deprecation?) should be reasonably straightforward

PicoCentauri commented 2 years ago

+1 for offering a conversion method. This could also live in units.py to be applicable in general.

I like to push this also from a downstream perspective. At MAICoS we are in the process of converting all units to MDA's standards. A general conversion method would be really nice since we are used to units like kg/m^3...