debbiemarkslab / EVcouplings

Evolutionary couplings from protein and RNA sequence alignments
http://evcouplings.org
Other
230 stars 75 forks source link

Improve mean-field integration #116

Open thomashopf opened 6 years ago

thomashopf commented 6 years ago

While investigating #104 I found a couple of issues with the mean-field protocol and couplings objects that would be great to address:

1) The mean_field couplings protocol should allow to specify the pseudocount weight as a parameter (given the inference method itself has the option already)

2) It looks like there may be some sign problem with the mutation effects computed using mean-field couplings models (damaging > 0 instead of < 0)

3) Inference of the independent model shoul be based on pseudocounted f_i matrix (this is the way I originally implemented it, before we moved over to the l2-based inference that is currently present in the CouplingsModel object. Ideally, this would be based on detecting the model is from mean-field upon loading, and returning a MeanFieldCouplingsObject that has the appropriate behavior. For now, the fix from #104 is to specify a default lambda_h of 0.01 and apply l2-regularized inference instead.

@sophiamersmann if you have any time to look into this would be greatly appreciated!

sophiamersmann commented 6 years ago

Hi @thomashopf , yes, I will have a look at it as soon as possible.

thomashopf commented 6 years ago

Great, thanks!

thomashopf commented 6 years ago

Thanks @sophiamersmann for implementing all the changes!

Whenever someone has time it would be great to verify on one of the beta-lactamase datasets that the mutation effect calculation with mean-field works as intended and we get a correlation around or above r=0.7 (negative sign -> damaging mutation) - leaving the ticket open until then.

If we added the model files to the testing files and create a test around this, this would serve as an additional comprehensive final check if the CouplingsModel class and the mutate stage are intact (partly solving the catch 22 how to get orthogonal values for testing these calculations).

sacdallago commented 6 years ago

@sophiamersmann would it be too much asking to integrate the tests in the pipeline? That would really be the cherry on top 🌈 😃

sophiamersmann commented 6 years ago

Sure! Do you want me to use any specific dataset?

b-schubert commented 6 years ago

@sophiamersmann you can use our test data:

http://marks.hms.harvard.edu/evcouplings_test_cases.tar.gz

This contains alignments and runs for a monomer and complex. let me know if you need to add some additional files and I will add them to the archive.

sacdallago commented 6 years ago

sorry for not getting back here 😮 I read it and then things piled up

sophiamersmann commented 6 years ago

Don't worry Christian. Thanks, Benni. I'll get it done as soon as possible. :)

thomashopf commented 4 years ago

Fixed a sign error in b88245011a714950e7c033a4ec0ebb1f442c71b3, now mutation effects have the correct sign. But somehow they still look odd overall so leaving open until completely verified