Managing default selection keywords

GoogleCodeExporter commented 9 years ago

Hi,
Following-up on a recent thread regarding issue with cholesterol selection, I 
think it'd be a good idea to have more control on the selection keywords used 
by MDAnalysis.
One the one hand having some default keywords built-in is nice and easy and 
will probably suit most users/common operations.

On the other terminology conflict may arise when trying to cover the 
nomenclatures used by different forcefields (e.g. CHO can refer to cholesterol 
but is also a special CHARMM residue).

To leverage on MDAnalysis powerful selection tools I'd suggest implementing a 
way to load one's own dictionary of selection keywords. Maybe the user could be 
alerted if the customised dictionary conflicts with the built-in dictionary and 
could choose whether or not to override it.

This would allow users to tweak dictionary to the particular 
forcefields/systems they use. 

What version of the product are you using? On what operating system?
MDAnalysis-0.7.5.1-py2.7-macosx-10.6-i386

Original issue reported on code.google.com by Jean.He...@gmail.com on 25 Apr 2012 at 10:31

GoogleCodeExporter commented 9 years ago

Points for discussion:

- We could have a module-level dictionary with keywords such as 

  selection_keywords = {'protein': ['ALA', 'ARG', ..., 'VAL'],
                        'nucleic': ['ADE', 'URA', 'CYT', 'GUA', 'THY'],
                        ...}

  and have the selection classes dynamically look up the keywords. 

  This would make it possible for a user to hack (I mean, adapt...) the dictionary. One could probably provide some frontend getter/setter functions such as

  set_keywords('protein', ....)

  which could do some sanity checking. It could work similar to the way that matplotlib manages its rc parameters. 

  One could also use the MDAnalysis.core.Flags registry (which is, I think, somewhat similar to 'traits').

- Maybe we should consider creating a "rc" file for MDAnalysis, such as 

   ~/.MDAnalysisrc

  where defaults for the Flags and selection keywords are set. Then one could easily customize MDAnalysis for one's preferred force field.

One question is, how likely would one want to change the residue selection 
definitions during the run of a script, i.e. is it important to be able to 
change the definitions at run time or would a static initialization (e.g. 
purely through an rc file) suffice?

Original comment by orbeckst on 26 Apr 2012 at 10:35

Added labels: Type-Enhancement
Removed labels: Type-Defect

GoogleCodeExporter commented 9 years ago

I would suggest that an .rc file would suffice. The nomenclature that one uses 
may be depending on one's studied system or forcefield used but should be 
fairly constant.

Original comment by Jean.He...@gmail.com on 26 Apr 2012 at 10:49

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

My ten cents would be against a static loading. Atom naming conventions could 
cause conflicts within a set of selection keywords. For example, selection 
string working fine on atoms named according to naming convention/force-field 
A, could be incompatible with the same set of atoms named according to a 
different ff/convention.

Secondly, there is no reason why this has to be static, other than being 
simpler to code up (which is more of an excuse than a reason, in my view).

Original comment by jan...@gmail.com on 1 Dec 2012 at 4:59

Added labels: ****
Removed labels: ****

GoogleCodeExporter commented 9 years ago

We could package default tables and configurations with MDAnalysis but provide 
means to read in user files. 

If so, what format should we choose for such files? Ad-hoc parsing (e.g. what 
we're currently doing with the hard-coded tables in tables.py) or ini-style 
(ConfigParser) or YAML (does python come with a default YAML parser) or XML or 
<insert suggestion here>?

We could still have an RC file but that would really only  specify which data 
files are to be loaded on startup. Without any entries (or without rc file!), 
MDAnalysis should just behave as before and read its internal defaults.

Parts of the code that could benefit from moving data into data/configuration 
files:
- topology building (atom names, masses, radii, ...)
- HBond analysis (define donor/acceptor heavy atoms)
- selections (what counts as protein, nucleic acid, lipid, ion, water, ...) 

For anyone interested: For GromacsWrapper 
https://github.com/orbeckst/GromacsWrapper I'm using ConfigParser and ini-style 
files to manage initialization and some data files although the use case is not 
quite the same as for MDAnalysis. Nevertheless, most of the logic is in 
gromacs.config 
https://github.com/orbeckst/GromacsWrapper/blob/develop/gromacs/config.py , see 
also the docs 
http://orbeckst.github.com/GromacsWrapper/gromacs/core/config.html and perhaps 
some of this could be useful for a configuration module for MDAnalysis — 
although I am certainly open to better solutions than what I hacked together 
:-).

Original comment by orbeckst on 2 Dec 2012 at 10:57

Added labels: ****
Removed labels: ****

dimchris / mdanalysis

Managing default selection keywords #104