MobleyLab / Lomap

Alchemical mutation scoring map
MIT License
37 stars 17 forks source link

Merge Michel lab/Cresset changes into LOMAP #54

Closed davidlmobley closed 2 years ago

davidlmobley commented 3 years ago

After discussion with @jchodera and others, we concluded we wanted to get the Michel lab/BioSimSpace/Cresset modified version synced back up. Huge thanks to lohedges for drawing together all the work represented in this PR. Here's the change list I got from him, below:

A summary of the key changes are given below:

Cresset (Mark Mackey):

The main change we made recently was the ability to provide link scores as an input to LOMAP, rather than have it calculate them. That means that for large data sets we can split the NxN matrix up into smaller submatrices, run parallel LOMAP jobs to get the scores/maps etc for each submatrix, and then run one big LOMAP job at the end to compute the optimal graph. Without this doing a LOMAP run on data sets with >100 molecules was very very painful.

  • Links file (see above)
  • Added support for SDF
  • Improved cyclisation algorithm
  • Improved handling of chirality
  • Added rules:
  • sulfonamide
  • heterocycle
  • methyl-to-ring
  • ring-size-change
  • hybridisation
  • Removed alkyl-to-halogen rule
  • Removed fingerprint functionality

(Just to note that most of the new rules are quite specific to single-topology FEP and in some cases are specific to AMBER/SOMD.)

BioSimSpace (Jenke Scheen / Lester Hedges):

  • Make package relocatable so that it can be bundled.
  • Remove Cresset hydrogen ordering restriction which was required for an RDKit bug that has since been fixed. (If preferable, this could be re-added with an option to disable in order to catch any regressions.)
  • Let RDKit figure out the bondLineWidth when drawing graphs.

@jchodera said he would try and review this or get his folks to do so. Our tentative plan is to get this merged in, then work with OpenEye, the Chodera Lab and others to think through a larger revision of tools in this area (which may or may not still be called LOMAP) which would use a modular, object oriented architecture and a simple API, along with class inheritance/overrides to allow customization.

cc @nividic

Lester, let me know if anyone else should be tagged. I'll be out of the office for a bit so if anyone wants to review without me that would be much appreciated.

davidlmobley commented 3 years ago

Also tagging @JenkeScheen

davidlmobley commented 3 years ago

Ah, and I forgot but @jchodera wanted me to tag @mikemhenry and @ijpulidos who he thought might be able to help get this reviewed and merged because it would relate to Perses work.

jchodera commented 3 years ago

@mikemhenry @ijpulidos @dominicrufa : The context here is that LOMAP is a tool for constructing molecule-to-molecule atom mappings and useful networks of transformations between molecules, but its original repository has been long neglected. Multiple forks have been in constant development for particular applications, and there is now motivation to collaboratively develop an independent package that unifies some of the API but allows others to customize functionality as needed.

However, the API is a disaster. There are hundreds of public methods, and no clear software idioms or design patterns in use. There's no way to collaboratively support a codebase like this without a dedicated maintainer.

Our goal is to

As a reference, my refactor of the atom mapping code for perses may be a useful example: This refactor exposes just a few public methods, utilizes a factory design pattern (where an AtomMapper factory with a customizable score_mapping function generates an AtomMapping object for a desired pair of openff.toolkit.topology.Molecule objects).

davidlmobley commented 3 years ago

Yes. So, for now, we're probably after giving this a minimal/quick review to look for anything which obviously needs to be addressed before merge. However, the code which this PR brings in is functional, whereas the code in the master branch is a couple of years out of date, so it's critical to get this merged in fairly soon to restore functionality. :)

jchodera commented 3 years ago

@davidlmobley : I have a different proposal: Instead of merging this into master, why not

We can then go from there.

jmichel80 commented 3 years ago

@davidlmobley @jchodera both options seem fine from our perspective. We would like to eventually refactor our implementation in BSS to pull in 'canonical' lomap (or whatever it will be called) from conda. If your updates are backwards compatible, or if you keep @lohedges in the loop about API changes that should ensure we are actually able to share the same codebase.

jchodera commented 3 years ago

@jmichel80 : I think the major question is who is willing to maintain this version of "canonical" LOMAP on conda-forge. If you folks want to do that independently, I think that would be fantastic.

As it stands, however, the public API appears to be too extensive and diffuse to be something we could collectively support, so I don't see us being able to contribute to supporting a stable version of the current API in any kind of long-term manner. My objectives are articulated here: https://github.com/MobleyLab/Lomap/pull/54#issuecomment-886122891

Basically, I think it's a useful starting point, I foresee us helping support a new-generation version of LOMAP with an enormously simplified public API that can be tailored for specific applications.

lohedges commented 3 years ago

@davidlmobley : I have a different proposal: Instead of merging this into master, why not

  • Retire master into legacy or lomap-1.0-long-term-maintenance or something to indicate it is the deprecated branch. (The other branches should similarly be retired but retained)
  • Tell GitHub that main is the new default branch, keeping the name main

This is exactly what I had in mind.

jmichel80 commented 3 years ago

@jchodera we would be happy not to maintain LOMAP ! We can work with you to make sure that a new version of the code would still give us the same functionality we rely on at the moment.

lohedges commented 3 years ago

One other thing to note is that the Cresset fork has removed some functionality from the original master branch, e.g.:

Removed alkyl-to-halogen rule Removed fingerprint functionality

I'm no LOMAP expert so don't know whether these options are widely used by other groups. If so, it might be worth discussing the functionality that would be expected from a canonical LOMAP implementation.

Future API changes would be perfectly fine by me, as long as things are clearly documented. We currently only make use of LOMAP within BioSimSpace.Align.generateNetwork and calls to the LOMAP API are entirely hidden from the user. As such, I can just update code within this function to reflect any changes going forward. (Also, our functionality is still experimental, so @JenkeScheen might have more ideas of what might be needed in future.)

What is important to us is having a conda installable version of LOMAP that works within the conda-forge ecosystem using the dependencies required by BioSimSpace . The version on main satisfies this requirement, whereas the master branch does not. (Mostly because of outdated NetworkX functionality.)

davidlmobley commented 3 years ago

OK, so I like the option of retiring master and telling GitHub that this is the new master branch.

@lohedges those functionalities don't seem important to me, or at least, anyone using them certainly hasn't contributed back here and isn't chiming in, so I'm fine with removing those for now.

Plans for long-term maintenance are still developing, but:

There's also been some discussion elsewhere as to whether the successor would be called LOMAP, or something else, and I'm not particular -- the main thing I care about is that there be a tool we can use (and contribute to) for this stuff which is maintainable.

davidlmobley commented 3 years ago

In other words, I think my current plan is:

a) Make "main" the new default branch b) Happily contribute to any efforts to make this LOMAP obsolete by replacing it with something better architected which maintains key functionality c) Deprecate LOMAP once its successor is available

lohedges commented 3 years ago

This sounds good to me. One other thing that I noted is that the README.md on main lists several additional authors. My contribution is certainly very small, so it might be better to divide the list into two categories, e.g. authors and contributors to make clear who did the bulk of the work, and who was responsible for the original concept and implementation.

Just to check, are you planning on building a conda-forge package from main, or is this something that you will only do once the refactoring and whatnot is in place?

davidlmobley commented 3 years ago

Good idea on the contributors/etc., this is woefully out of date.

I'd want to build a conda forge package from main, though I don't think anyone on my end getting involved here has done that before, so we could probably use help and/or tips if you happen to have time and interest.

lohedges commented 3 years ago

Yes, I'd be happy to help with that. Given that it's pure python with standard dependencies it shouldn't be an issue. When I get a chance I'll try to make an initial recipe for you.

davidlmobley commented 3 years ago

One issue I noticed with this is that it makes the examples nonfunctional (removes getMapping).

lohedges commented 3 years ago

Ah, I didn't notice that. Hopefully it's easy to re-add the functionality.

lohedges commented 3 years ago

Okay, I reinstated the getMapping method using the code from the master branch.

mikemhenry commented 3 years ago

@jchodera @davidlmobley This is a pretty big PR :)

How would you like me to review this? In this initial PR are we looking to clean up the API as well? Do we want to switch from travis CI to github actions?

davidlmobley commented 3 years ago

@mikemhenry this may not be worth reviewing; I think the path forward is that, well, the current master is nonfunctional and this branch is functional, so we need to switch to this branch, probably without a detailed review. Then we should begin to think about API issues and switch from Travis CI to GItHub actions, etc.

davidlmobley commented 2 years ago

OK, as per discussion I have updated the default branch from master to main and thus the code in this PR is now "live" I think. I will close this PR, and also update the changelog in a separate PR.