MDAnalysis / UserGuide

User Guide for MDAnalysis
https://userguide.mdanalysis.org
22 stars 33 forks source link

Using Universe #2

Closed lilyminium closed 4 years ago

lilyminium commented 5 years ago

Story: As a molecular dynamics scientist who is familiar with packages such as mdtraj and cpptraj, I want to quickly get information such as atom properties and coordinates, so that I can analyse my data.

Acceptance criteria:

richardjgowers commented 5 years ago

Parser vs Reader is an annoyance, you can "parse coordinates" and "read a topology", the verb doesn't disambiguate. One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

The Topology object itself isn't well documented, but this is partly because it's not currently a public part of the API. I don't think there's anything that a user ever has to do which directly touches the object, all things are done via AtomGroup. (This is because of historical reasons, originally things were actually stored in AtomGroups (historically, technically a list of Atom objects), rather than the access via AtomGroup approach now).

So I think you might find that the Topology object doesn't need documenting for users...

see also:

https://github.com/MDAnalysis/mdanalysis/issues/2199

orbeckst commented 5 years ago

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

On the other hand, anyone doing MD knows about "topology files" so perhaps the difficulty is more making clear what out "static" data are (atom identities, bonds, charges, ...) and our "dynamic" ones (positions, velocities, forces, box information, ... and auxiliaries for the advanced crowd).

I agree to drop the Topology object for right now. More importantly is how to make use of what the topology enables, namely bonds(), angles() etc – this is woefully underdocumented.

orbeckst commented 5 years ago

hierarchical relationships

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

lilyminium commented 5 years ago

I agree that users are unlikely to interact with a Topology.

Parsers vs Readers gets even more confusing when we use a single file for both topology and coordinate information.

One option is we could rename these (at least in all documentation) to TopologyParser and CoordinateReader so it's extra clear what is getting parsed/read.

This seems like a good solution. I think I thought it was important to include this distinction because a trajectory is usually just some kind of Reader object pointing to a frame.

In MDAnalysis we talk of a hierarchy of containers: Segment > Residue > Atom and then we have containers that can span different levels: AtomGroup is "just a bunch of Atoms" and Fragment is "a bunch of atoms connected by bonds".

@orbeckst Are fragments used anywhere but in methods for periodic boundary conditions?

orbeckst commented 5 years ago

On Sep 9, 2019, at 7:28 AM, Lily Wang notifications@github.com wrote:

@orbeckst https://github.com/orbeckst Are fragments used anywhere but in methods for periodic boundary conditions?

@jbarnoud used them extensively for various things, IIRC.

You should also be able to group by fragments.

But I don’t think they are part of the selection language. (That’s another area where harmonization or at least documentation would be good: How can I do X with (1) select_atoms(), (2) methods, (3) pandas-style slicing ag[ag.masses < 2].

jbarnoud commented 5 years ago

I indeed use fragments on a regular basis because segments are very ill-defined. The meaning of a fragment varies depending on the input format, so fragment may be the most reliable way of identifying a molecule.

jbarnoud commented 5 years ago

I realized I mistyped. I meant to say that the meaning of a segment varies from one format to the other.

lilyminium commented 4 years ago

@orbeckst @jbarnoud Thanks for summarising fragments and segments for me. There's a third concept in MDAnalysis: molecules. Am I correct that fragments and molecules are synonymous in MD theory but independent in Python implementation: segments are defined by segid in the topology, molecules are defined by molnum in the topology, and fragments are defined by connectivity?

I'm unfamiliar with MD segments. In theory, are they subsets of molecules, or can segments overlap different molecules? Is it the same case in MDAnalysis' implementation?

Do the relationships in this diagram make sense? Each monospace greyscale shape is a real class in MDAnalysis, while the orange Helvetica fragment and molecule are just convenient concepts. In this diagram, a molecule is not a collection of segments, but rather a collection of residues.

classes

These are the methods that use fragments:

Methods that use molecules:

jbarnoud commented 4 years ago

In principle you are right and segment, fragment, and molecule should be synonymous. In practice, however, they are not.

A fragment is, indeed, defined by the connectivity. A molecule is, for now at least, a Gromacs only concept: it describes what is defined as a molecule in a Gromacs topology. A Gromacs molecule is, in most cases, a connected ensemble of atoms but it does not have to be. The meaning of a segment is different from one file format to another.

Here is an example where all of these concepts are the same: take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

The segments match the definition of the molecules because it is how we read them from TPR files. If we read the segments from a PDB file, then the segments correspond to the chains so it is very likely that each segment will constitute of a monomer and its ligand.

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule. While it is not the default, but a user can choose to do so if they need to create specific interactions between the monomers or to make fixing periodic artefact a little bit easier.

Finally, the fragment will be clear cuts in most cases. However, it can happen that some atom will be defined as virtual particles. In such a case, these atoms will not be connected to the rest of the molecule and will appear as their own fragments. This last case can most likely count as a but, though: https://github.com/MDAnalysis/mdanalysis/issues/1954.

So, yes, in principle, your schema is correct. But...

jbarnoud commented 4 years ago

Also, you can do atoms.groupby('molnums').

lilyminium commented 4 years ago

Thank you, @jbarnoud . Just to clarify: the difference between

take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

and

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule.

is what is included in the moleculetype definition?

richardjgowers commented 4 years ago

Fragments is defined by mda based on bonds, so it’s something we calculate as a derived quantity.

Molecules is something read from gromacs, so is more of a primary source where we’re blindly trusting the topology file.

I think.....

On Sep 28, 2019 at 23:46, <Lily Wang (mailto:notifications@github.com)> wrote:

Thank you, @jbarnoud (https://github.com/jbarnoud) . Just to clarify: the difference between

take a multimeric protein where each monomer is attached to a ligand; you read the topology from a Gromacs TPR file. Here, each monomer and each ligand is a fragment, a molecule, and a segment.

and

It happens that a multimeric protein is defined in a Gromacs topology as a single molecule.

is what is included in the moleculetype definition?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/MDAnalysis/UserGuide/issues/2?email_source=notifications&email_token=ACGSGBYPM4ADIL3OKB5WQIDQL7ND7A5CNFSM4ISSWN42YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD73DY7Q#issuecomment-536231038), or mute the thread (https://github.com/notifications/unsubscribe-auth/ACGSGB4REXHJYNT7O5H3R6DQL7ND7ANCNFSM4ISSWN4Q).

jbarnoud commented 4 years ago

@lilyminium Yes, "molecule" is based on the "moleculetype" section of a Gromacs topology.

@richardjgowers I'd say so, yes.

lilyminium commented 4 years ago

Closed by #14 and #30.