MDAnalysis / mdanalysis

MDAnalysis is a Python library to analyze molecular dynamics simulations.
https://mdanalysis.org
Other
1.31k stars 650 forks source link

refactoring HydrogenBondAnalysis #2238

Closed orbeckst closed 4 years ago

orbeckst commented 5 years ago

Is your feature request related to a problem? Please describe. The analysis.hbonds.hbond_analysis.HydrogenBondAnalysis class is very useful and widely used but the code is messy and old and thus hard to maintain, to debug (see e.g. #1687), or to extend. It also does _not follow the Analysis API_. Performance is also not stellar (e.g. does not use capped_distances())

Describe the solution you'd like Refactor/rewrite, ideally without breaking the API, which includes

The data structures should be clean, see #2177 for a discussion.

The class should implement the Analysis API ("Bauhaus") from #719.

Describe alternatives you've considered

  1. Leave the old code in place. This has the advantage that all the users that have been using this piece of code heavily won't get their code broken... See @kain88-de 's https://github.com/MDAnalysis/mdanalysis/issues/2177#issuecomment-454917794

    Before you make any changes to the water bridge output please consider [HydrogenBondAnalysis] is a very widely used feature of MDAnalysis. It is the most frequently cited analysis in papers! I don’t have exact numbers but I looked at almost every paper this year citing MDAnalysis.

    Maybe we keep the old time series and add the improved version with a new name.

  2. Keep old version of the code and deprecated it for release 2.0 (!), i.e., keep it around forever in maintenance mode. Add the new module under a different name.

Current work

Currently there are two PRs with different solutions

orbeckst commented 5 years ago

@xiki-tempula you said about the new proposed HydrogenBondAnalysis (PR #2237 by @p-j-smith ) https://github.com/MDAnalysis/mdanalysis/pull/2237#issuecomment-480543288

The code looks really nice, the capped distance is a very neat way of finding the atoms compared with the old neighbour search. I will try to use capped_distance in the water bridge we well. I guess the only problem is that the interface and the outputs are very different from the previous one. Personally, I hate the original representation as well.

Do you think there is a way that you can build/refactor WaterBridgeAnalysis on the new class?

I think we are starting a discussion about what the API of a new HydrogenBondAnalysis ought to look like so things would likely change. But you would think that you could – in principle – work with the new class? And do you think that would actually have the time to do it if necessary?

xiki-tempula commented 5 years ago

@orbeckst I guess the major obstacle is the underlying data structure. In wba, the data is stored as a network to represent the nature of the water network, so the analysis part acts upon the underlying network data structure. The other though a bit trivial problem is the amount of redundancy removed by combing two classes. If expressed in simple pseudocode, hba works as

for frame in frames:
    find_hb(sele1, sele2)

whereas wba works as:

for frame in frames:
    find_hb(sele1, sele2)
    find_hb(sele1, water1)
    for water_no in number_of_water:
        find_hb(water[water_no], sele2)
        find_hb(water[water_no], water[water_no+1])

So the only common part is find_hb(sele1, sele2), which is only 3 lines in the current PR.

orbeckst commented 5 years ago

Update:

  1. We want to go ahead with developing @p-j-smith and @bieniekmateusz 's new implementation PR #2237 as the new version of hydrogen bond analysis.
  2. The new version will live in MDAnalysis.analysis.hydrogenbonds.hbond_analysis so that we can keep the old version in MDAnalysis.analysis.hbonds.hbond_analysis and slowly move it into deprecated and legacy status (by 1.0?).
  3. The new version does not have to be fully API compatible with the old version (because we are keeping the old one).
  4. We will keep the widely used hbonds.hbond_analysis around for a while (with its tests) but won't continue developing it and will strive to replace it with the new version within MDAnalysis (e.g., for WaterBridgeAnalysis, PR #2087).
xiki-tempula commented 5 years ago

The PR #2087 is fully compatible with the old hbond_analysis and has passed all the test. I wonder if it is possible to review the PR, please? Thank you.

orbeckst commented 5 years ago

See my https://github.com/MDAnalysis/mdanalysis/pull/2087#issuecomment-516939512.

lilyminium commented 4 years ago

As part of this waterdynamics.HydrogenBondLifetimes should be refactored to use hydrogenbonds.HydrogenBondAnalysis and/or turned into a subclass. Edit: #2547

orbeckst commented 4 years ago

This issue was fixed by PR #2237.

Anything else can go into their own issues.