MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.28k stars 646 forks source link

Fraction of native contacts (Q) analysis a'la Best-Hummer #702

Closed jandom closed 8 years ago

jandom commented 8 years ago

It's an extension of the MDAnalysis.analysis.contacts module that I wrote. Will knock up some tests and send a PR.

I think this would be useful to people? It's certainly useful to me :P

hainm commented 8 years ago

Yes I think so. +1

dotsdl commented 8 years ago

+1!

jandom commented 8 years ago

Awesome, 1+1 equals 2 and we all know it takes 2 to tango or implement a feature :P

Anyhow here is a caveat: I'm not a big fan of classes everywhere, this analysis is so simple that it can (and i personally think 'should') be implemented using a function.

Here is what I've got so far, tested&tried on things like villin and Glycohorin dimer

def calculate_contacts(ref, u, selA, selB, radius=4.5, beta=5.0, alpha=1.5):

    # reference groups A and B from selection strings
    grA, grB = ref.select_atoms(selA), ref.select_atoms(selB)

    # reference distances (r0)
    dref = distance_array(grA.coordinates(), grB.coordinates())

    # select reference distances that are less than the cutoff radius
    mask = dref < radius

    #print("ref has {:d} contacts within {:.2f}".format(mask.sum(), radius))

    # group A and B in a trajectory
    grA, grB = u.select_atoms(selA), u.select_atoms(selB)
    results = []
    for ts in u.trajectory:
        d = distance_array(grA.coordinates(), grB.coordinates())
        x = 1/(1 + np.exp(beta*(d[mask] - alpha * dref[mask])))
        results.append(( ts.time, x.sum()/mask.sum() ))
    results = pd.DataFrame(results, columns=["Time (ps)", "Q"])
    return results

Remarks 1) not super efficient, for simplicity i calculate the entire distance matrix for an AtomGroup at every frame, 2) contacts are double counted, again for simplicity (distance matrix element [i,j] is the same as [j,i]) but since we take the mean, it's all good, 3) introduces dependency on pandas, i guess we want that out?

orbeckst commented 8 years ago

Well, sorry to break it to you: become a fan of the new analysis class that @richardjgowers prototyped. Unified interface for analysis is the next frontier here and we really have enough old code to retrofit.

Similarly: provide a test.

We can live with pandas in the analysis module although it does not seem strictly necessary here. In any case, import it at the top level in your PR and people can discuss it then.

jandom commented 8 years ago

Oh, if @richardjgowers has something that'll become standard, I'm more than happy to adopt :) Cheerio for the comments Oli!

kain88-de commented 8 years ago

If I remember the Hummer & Best paper correct they change the hard cut-off to a soft sigmoidal-like function. It would be nice if we could use also other functions. Could you make it so that an arbitrary function can be given as a argument to the contact analysis to determine the cut-offs.

richardjgowers commented 8 years ago

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/analysis/base.py#L35

We've made this, and then we're work on making it automatically parallel

richardjgowers commented 8 years ago

Ah, and re: not using a class. I usually prefer classes because you can attach most post analysis and saving results in the class object, but that's probably just a matter of taste

jandom commented 8 years ago

Okay, I'm looking at this @richardjgowers https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/analysis/base.py

It actually looks completely sane! Can run return things, or is it supposed to return None and I have to use an instance attribute to share the results with the user?

richardjgowers commented 8 years ago

Hmm, not sure. You could make it return and save the result inside the class too. But if you override run you'll have to rewrite a little of the parallel stuff or do some supering. So maybe?

orbeckst commented 8 years ago

Generally I would save results as an attribute because when I do interactive work I hate it when I suddenly get gb of arrays on my screen just because I forgot to assign to a variable.... But I suppose standard are emerging here. Quickly check how the other analysis classes like RMSF or the polymer ones do it.

Oliver Beckstein email: orbeckst@gmail.com

Am Feb 9, 2016 um 3:31 schrieb Richard Gowers notifications@github.com:

Hmm, not sure. You could make it return and save the result inside the class too. But if you override run you'll have to rewrite a little of the parallel stuff or do some supering. So maybe?

— Reply to this email directly or view it on GitHub.

jandom commented 8 years ago

@orbeckst haha that's so funny - I share the same feeling towards getting some random text file written to disk instead of a numpy array that i clearly asked for :P MDAnalysis way seems to be to write files, so I want to adhere to that.

@kain88-de i like the "pass a callback as argument" idea but i'm afraid my python skill is too low for this, unless you can help me out a bit? Here is the 'best' I came up with....

The best-hummer Q callable for a frame is :

# r - distances at time t
# r0 - reference distances 
# beta - softness
# lambda - tolerance
def foo(r, r0, beta, lambda): return 1/(1 + np.exp(beta*(r - lambda * r0)))

A more general way to put it would be

def foo(r, r0, **kwargs): return 1/(1 + np.exp(kwargs.get('beta')*(r - kwargs.get('lambda') * r0)))

Does this fit your idea of a callback function passed as argument? Where the only necessary arguments are r and r0

richardjgowers commented 8 years ago

https://github.com/MDAnalysis/mdanalysis/blob/develop/package/MDAnalysis/analysis/polymer.py#L42

I think the persistence length thingy I wrote works ok. So run doesn't return anything, but creates attributes. Then I've got some plotting and line fitting and saving as optional methods.

Anyway, make a PR already @jandom ! :P

jandom commented 8 years ago

Getting whipped here :P Yup, was just looking at polymer.py Thanks, working on the flipping tests... ;)

kain88-de commented 8 years ago

@kain88-de i like the "pass a callback as argument" idea

Nice. So it depends how we want to design this exactly. I'm personally like the interfaces from numpy and scipy a lot. I think cdist is a good example here.

So my first try for a possible API is this. (Note I haven't looked at BaseAnalysis for it's API)

class Contact(BaseAnalysis):
    def __init__(self, universe, cut, method='cut-off', *method_args):
        self.uni = universe
        self.method = method
        self.cut = cut
        self.method_args = method_args

    # or the appropriate name that doesn't override the function
    def run():
        if self.method == 'cut-off'
            self._cut_off(self.method_args)  # normal hard cut of
        elif self.method == 'soft-cut'
            self._soft_cut(self.method_args)  # best hummer
        else:
            # what ever the user has given
            self._custom_method(self.method, self.method_args) 
jandom commented 8 years ago

@kain88-de thanks for closing and all your help on this one