MolarVerse / PQAnalysis

PQAnalysis is a API/CLI python package for the analysis of MD simulations
https://molarverse.github.io/PQAnalysis/
MIT License
4 stars 2 forks source link

Molecular selection for the use of the center of mass as target in the MSD #43

Closed stkroe closed 4 months ago

stkroe commented 4 months ago

Is your feature request related to a problem? Please describe. The MSD routine would need a possibility to choose the center of mass of a molecule as a target.

Describe the solution you'd like The grammar could be adapted for molecular selection.

Additional context If it is possible to select by molecules e.g. H 2 the center of mass can be used as target. The selection is using AtomicSystem which can call the property center of mass.

97gamjak commented 4 months ago

Important feature request.

So I am unsure what would be the best option to choose from. I have to possible solutions in my mind, but first a bit of background.

So until now at the moment the Selection class and the selection.py module only use the Topology of an AtomicSystem instance to determine the parsed selection. But as @stkroe indicated it would we really helpful to have the full AtomicSystem information available to the Selection class and indeed it was also planned for future releases.

By this enhancement a lot of interesting selection features would get available (after implementation) like e.g.

selection = "CA < 2.0"  #selecting everything within 2.0 Angstrom of atomtype CA

selection = "res~5 > 3.0" #selecting everything not within 3.0 Angstrom of (for example - not yet defined) the center of mass of residue with index 5

So all of these considerations would be quite forward to implement, however maybe by reading in between the lines the problem with a center-of-mass selection can already be identified.

The selection module is in a way defined that it depends on Topology and in future also highly likely on AtomicSystem (would also be necessary for something like center-of-mass selection). Due to this dependency the selection module is designed in a way that it only yields the absolute atomic indices of the full Topology (aka. AtomicSystem) with which they can be then indexed. But in order to make a center-of-mass selection it wouldn't be sufficient to return the indexed system, as the AtomicSystem does not contain any information about the center-of-mass positions. In order to implement this the selection would have to yield somehow not only indices but immediately a new AtomicSystem represented by the center-of-mass coordinates. This would lead to a quite severe design problem (circular dependeny):

AtomicSystem -> Selection -> AtomicSystem

So I would have now two solutions I can think of:

Solution 1

We just introduce a new method to the AtomicSystem class, which return a new AtomicSystem object based on the center-of-mass of the residues and an optional key for command line programs and also input file tools use-center-of-mass.

Solution 2

Probably the more cleaner solution from the perspective of a user we could wrap the actual definition of Selection in an other module along with the AtomicSystem definition, which in the end would make it possible use something like:

selection = com(res(H2))

The implementation of this restructuring would not be that straightforward - but probably manageable.

HOWEVER We would have to rethink the grammar rules defined to get a clean and non-ambiguous grammar syntax (EBNF). To be more clear at the moment I am not sure how to combine different selection entries like for example.

selection = com(res~3 and res~7) # would be really straightforward

# so the | operator is nothing else than a without
# by writing 4 it would just match the atom in the system with index 4
selection = com(res(H2)) | 4

How would now the last selection be parsable? Of course in order to be a complete grammar definition we should be able to write something like this:

seletion = selection1 | selection2

# with
selection1 = com(res(H2))
selection2 = 4

But of course for the interpretation of these two results would strongly differ in terms of logical interpretation when just looking at the two selection definitions. It would also be possible two violate that selection = selection1 | selection2, but I would choose this kind of solution only as a sort of last resort - as this HAS to lead to some severe complications and erronous definitions in future extensions of the grammar.

So for the interested readers who made it until this point - If you have any idea how to implement/define Solution 2 please let us here below (or in a discussion section that I will probably open soon).

For me at the moment I would just go with Soltution 1 as I am not sure on how to handle the grammar issues in Solution 2

@galjos @stkroe

galjos commented 4 months ago

Question: Would it make sense to have a Molecule class/selection that would automatically use the COM for the MSD calculation?

97gamjak commented 4 months ago

Question: Would it make sense to have a Molecule class/selection that would automatically use the COM for the MSD calculation?

What do you exactly mean with Molecule class. So of course, choosing the use-center-of-mass could be the default setting.

Then internally one would just have to perform something like

selection = Selection("some selection").select(atomicSystem.topology)

com_system = atomicSystem[selection].com_resiudes()

Of course this is only possible if the user builds the atomicSystem with information about residues... - as most of the selection features do.