MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.26k stars 640 forks source link

Implement context aware guesser #3704

Open aya9aladdin opened 2 years ago

aya9aladdin commented 2 years ago

Is your feature request related to a problem?

The current Guesser class used by the users or inside parsers hava a downside of being generic, which makes it doesn't fit all topologies and force fields . This result in various errors while guessing different atrributes (#2348 # 3218 #2331 #2265). That’s why we need a guesser class that is aware of the context of the universe of which it is a part.

Describe the solution you'd like

I got inspired by @lilyminium (#2630), @jbarnoud , and @mnmelo (#598) proposals, so I mixed them with my vision to get the following methodology: • Guessing is not an automatic process; by default, guessing is off unless the user chose to guess which property. • Guesser should raise warning/messages when succeeding or failing in guessing a property (one single message for the whole guessed attribute). • Guessed properties should be easily modifiable. • Passing the context to the universe should be available at and after the initiation level. • Modifications and maintenance of guessers should be convenient.

Implementation:

 
class GuesserMeta(type):
    #guessers registeration
    def __init__(cls, name, bases, classdict):
        type.__init__(type, name, bases, classdict)
        try:
            context = classdict["context"]
        except KeyError:
            pass
        else:
            _GUESSERS[context.upper()] = cls
 
#basic guesser class
class BaseGuesser(GuesserMeta):
    context = "base"
    
# a dictionary to keep track of the properties currently guessed by the class
    __guess={}
    
#check if the class has the desired guessing method for the desired attribute
    def is_guessed(to_guess):
         for a in to_guess:
             if (a not in self.guess.key()):
                 raise ValueError ('Invalid property: ', a)
         return True
    
#Martini-speciifc class that inherit from the Guesser class    
class MartiniGuesser(BaseGuesser):
    
    context = 'Martini'
    __guess = {'mass': self.guess_mass, 
                 'charge' : self.guess_charge}
    
    def guess_mass(atoms):
        #TODO
               
    def guess_charge(atoms):
        #TODO
 
 

 
#pass context by name
u = mda.Universe("E://1rds.pdb", context="PDB", to_guess=["mass", "bonds"])
 
#pass context by object
pdb = md.guesser.PDBGuesser
u = mda.Universe("E://1rds.pdb", context=pdb, to_guess = ["mass", "bonds"])
 
 
#pass context by name
u.guess_topology_attr(context="pdb", to_guess=["mass"])
#pass context by object
pdb = md.guesser.PDBGuesser
u.guess_topology_attr(context=pdb, to_guess["mass"])
 
#method used by the universe to check the validity of the passed arguments
    def get_guesser(guesser, to_guess):
        if isinstance(guesser, BaseGuesser):
#check if the guesser has guessing method(s) for the 'to_guess' list
            try:
                guesser.isguessed(to_guess)
            except ValueError as e:
                print('Value error')
        else:
            try:
                guesser = _GUESSER[guesser]
            except KeyError:
                print("invalid guesser")
        return guesser
 
u = mda.Universe("E://1rds.pdb", context="PDB", to_guess=["mass", "bonds"])
#output of succusseful mass guessing:
# sucessful guessing: guessed masses 90/90
#output when guessing fail for some atoms:
# guessed masses 87/90
# UserWarning: Failed to guess the mass for the following atom(s):
# id 3 name XX
# id 9 name foo
# id 34 name bla
#output for bonds guessing:
# guessed bonds 100
# new fragments 80

(white is given, orange is guessed)

Current default behavior

default

PDB

pdb final

Martini

martini final

jbarnoud commented 2 years ago

Thank you for opening this issue. Could you please:

Just to clarify. When you write:

Guessing is not an automatic process; by default, guessing is off unless the user chose to guess which property.

This is the ideal scenario but we cannot have that until version 3.0 as it is a breaking change. You do address this, but it is worth repeating.

Because no default guessing is a desirable, it would be good to have a way to request it. It would also be useful to prepare a transition with future versions of mdanalysis.

jbarnoud commented 2 years ago

Different guessing methods will through warnings/error messages depending on the results it got, the output messages should precisely describe the universe updates with a warning about failed processes.

Where do you plan the output for successful guessing to be? Do we have anything comparable at the moment?

lilyminium commented 2 years ago

The mission of deciding how an attribute will be guessed will be carried out by the corresponding attribute guesser method related to each class. I think in this way the user doesn’t have to bother about how a guesser should work, and in the spirit of implementing a context-specific guesser, we have an abstraction power by having aware and smart guessers that know how exactly any attribute should be guessed for a specific environment (for example guessing mass for PDB is related to the element property, while for Martini is more related to bead type (atom type in MDAnalysis).

It might be really cool to have some dependency diagrams for this! Both for documentation, and for clarity in the project. For example, as you point out with PDB guessing, guessing the mass depends on knowing the element. In turn, guessing the element depends largely on knowing the atom name and residue name. Having this clarity gives us room for adding future guessers and features more easily. For example, with the PDB, it could be possible to guess the residue name from elements + bonds (which is likely out of scope for the current project). We could maybe look into decorators to make sure that the attribute dependencies are either already present or guessed first.

Edit: and the diagrams would look great in a blog post.

aya9aladdin commented 2 years ago

The mission of deciding how an attribute will be guessed will be carried out by the corresponding attribute guesser method related to each class. I think in this way the user doesn’t have to bother about how a guesser should work, and in the spirit of implementing a context-specific guesser, we have an abstraction power by having aware and smart guessers that know how exactly any attribute should be guessed for a specific environment (for example guessing mass for PDB is related to the element property, while for Martini is more related to bead type (atom type in MDAnalysis).

It might be really cool to have some dependency diagrams for this! Both for documentation, and for clarity in the project. For example, as you point out with PDB guessing, guessing the mass depends on knowing the element. In turn, guessing the element depends largely on knowing the atom name and residue name. Having this clarity gives us room for adding future guessers and features more easily. For example, with the PDB, it could be possible to guess the residue name from elements + bonds (which is likely out of scope for the current project). We could maybe look into decorators to make sure that the attribute dependencies are either already present or guessed first.

Edit: and the diagrams would look great in a blog post. the diagrams idea is really great, I will work on finding how to deliver it in the most descriptive way then adding it to my blog

ojeda-e commented 2 years ago

Edit: and the diagrams would look great in a blog post. the diagrams idea is really great, I will work on finding how to deliver it in the most descriptive way then adding it to my blog

@aya9aladdin I use draw.io (now https://app.diagrams.net). It's a very nice tool with lots of cool options to design custom diagrams. I used it for general stuff and also used them for docs.

RMeli commented 2 years ago

It might be really cool to have some dependency diagrams for this!

100%!

I always found the name "guesser" somewhat confusing, because some attributes are not technically guessed but they are read from a table once other (guessed) attributes are available. A diagram would clarify this confusion by showing what is "guessed" (i.e. inferred with some predefined rules, which might not be perfect) first, and what it is simply set based on the guessed attributes later on.

I think it would be great to color-code (or distinguish in other ways) which attributes are guessed and which are set on the diagram.

aya9aladdin commented 2 years ago

I have added a dependency diagrams to the issue for declaration, you can also see my project updates on my personal blog https://sites.google.com/pharma.asu.edu.eg/aya-gsoc/home

jbarnoud commented 2 years ago

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

aya9aladdin commented 2 years ago

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

it's more of a reflection of what I'm going to build in both PDB and Martini guessers

jbarnoud commented 2 years ago

Thank you @aya9aladdin. Is your diagram reflecting what you plan on going, or is it what we currently have in the code?

it's more of a reflection of what I'm going to build in both PDB and Martini guessers

In that case, could you do the same diagram but with the current state of the code. It is what we had in mind when asking for the diagram. Looking at this, will require you to find the various places where the guessing is done, and you may realise oddities. For instance, that some guessing functions take arguments which you will need to consider when writing your code.