MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.3k stars 647 forks source link

Allow select_atoms to select chain #2875

Closed xiki-tempula closed 3 years ago

xiki-tempula commented 4 years ago

Is your feature request related to a problem?

The PDB standard defined location 22 as chain ID. The charmm standard defined the segment id being a 4 letter ID starting at 73.

Currently, mda assumes that the segment id is chain id when segment id is in the absence and will ignore the chain id when the segment id is given.

Ideally one could select chain based on chainid.

u.select_atoms('chainid A') or u.select_atoms('chain A') if we do it in the pymol way

Related to #2874

orbeckst commented 4 years ago

Chain vs Segment

chain

A chain is a polymer term, specifically, from PDB files (ATOM chainId and see TER and SEQRES for clarification) and originally means one polymer, as expressed for SEQURES (my emphasis)

SEQRES records contain a listing of the consecutive chemical components covalently linked in a linear fashion to form a polymer. The chemical components included in this listing may be standard or modified amino acid and nucleic acid residues. It may also include other residues that are linked to the standard backbone in the polymer. Chemical components or groups covalently linked to side-chains (in peptides) or sugars and/or bases (in nucleic acid polymers) will not be listed here.

Each SEQRES entry has a corresponding chainId in ATOM records and should be terminated with a TER (although in the wild this is often omitted).

Segment

A segment originates (as far as I know) from PSF files and is generally used to mark up a collection of molecules. This is often used to label single proteins or all lipids or all waters or the whole solvent. The charmmtutorial.org: CHARMM:The Basics: Molecule Metadata treats "chain" and "segment" as equivalent

Residues are further grouped into chains, or segments, which represent major functional units of the protein.

but then shows an example where all water molecules are in a segment with SEGID W.

In practice, segments are used as a convenient container for collections of "residues", where residues can either be building blocks of a polymer or individual molecules such as lipids or waters or bare ions.

selection keyword

A quick survey indicates that chain is probably a good keyword to use.

VMD

VMD's selections have the keywords

CHARMM

MDAnalysis selections were modelled after CHARMM so unsurprisingly (see charmmtutorial.org: Atom Selection and c42b1 select

PyMOL

See pymolwiki.org: Selection_Algebra

Related operators

mdtraj

mdtraj does not seem to store chains/segids, at least based on mdtraj: Atom Selection Reference it only lets users select the internal chainid :

Feel free to correct me on any of the above.