MobleyLab / alchemical-analysis

An open tool implementing some recommended practices for analyzing alchemical free energy calculations
MIT License
113 stars 58 forks source link

General suggestions for code cleanup/improvements #48

Open halx opened 8 years ago

halx commented 8 years ago

There is currently a lot of cruft growing in the code which is basically down to design choices.

Here some suggestions: The parsers should be designed in a "polymorphic" fashion such that the main code does not need to know anything about the individual parser codes. This implies that a clear interface and a specification must be designed to define what data is passed in and what is passed out and in what form. It is the responsibility of the parser code to signal back what it can and cannot do. E.g., as it stands now, dhdlt should be None if MD code can't do TI or u_klt should be None if it *BAR analysis can't be done. Energies should be converted to a "canonical" form or it should be signalled back to the main code what the units are. This will relieve the main code maintainer to do code-specific clean-up operations.

Optparse should be replaced with argparse.

The globals need to go as this is really bad programming style. The problem with this is that all these variables are accessible throughout all of the module which mean that they can be modified anywhere. This can create hard to track down bugs (all you need is a second person messing around there...).

halx commented 8 years ago

One practical thought on this. I think the code should have an understanding of what the various analysis methods "mean". Not sure if there's something in place but e,g,

all_methods = set(['TI','TI-CUBIC','DEXP','IEXP','GINS','GDEL','BAR','UBAR','RBAR','MBAR']) ti_methods = set(['TI','TI-CUBIC']) bar_methods = set(['DEXP','IEXP','GINS','GDEL','BAR','UBAR','RBAR','MBAR'])

other methods if applicable

selected_method = set(P.methods) # or via argparse

if not selected_method & bar_methods: # intersection do_bar = None

if u_klt is None: selected_method = selected_method - bar_methods

etc.

davidlmobley commented 8 years ago

Yes, I agree with all of this, and most especially the issue relating to global variables. Probably we should break up this issue into several separate issues which make specific proposals for code changes that can be handled relatively independently, i.e.:

halx commented 8 years ago

Ok, that's certainly a plan but the practical problem is that this affetcs one single file which in turn means conflicts in parallel development. For the near future this will be just as it is but in the long run it may make sense to split alchemical_analysis.py in smaller but logical chunks. In this context I would also suggest to develop the tool into a proper library with a well-defined API. alchemical_analysis.py will then just be the command line frontend. Doing this will also mean that writing to stdout has to be modified e.g. through a logger that explicitly needs to be switched on for the command line tool but would not output anything by default. There may be other possibilities.

On 18 December 2015 at 19:09, David L. Mobley notifications@github.com wrote:

Yes, I agree with all of this, and most especially the issue relating to global variables. Probably we should break up this issue into several separate issues which make specific proposals for code changes that can be handled relatively independently, i.e.:

  • replace optparse with argparse
  • decide canonical form for energies
  • discuss and determine interface for parsers (specify what data goes in and what comes out so that parsers can be independent/polymorphic) and what units are
  • rewrite parsers to conform to interface specifications and use canonical units
  • eliminate global variables
  • make code know what class different methods are in, and use this to (???) (simplify plotting decisions, ...?)

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-165873053 .

halx commented 8 years ago

Ok, I have created a branch 'refactor'. Done so far.

davidlmobley commented 8 years ago

OK. It may take me a couple of days to get to review this one. What needs to be discussed regarding parsers?

Thanks.

On Mon, Jan 4, 2016 at 4:00 AM, Hannes Loeffler notifications@github.com wrote:

Ok, I have created a branch 'refactor'. Done so far.

  • dynamic loading of parsers: need to discuss the interface
  • replaced optparse with argparse
  • eliminated globals in main()

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-168659004 .

David Mobley dmobley@gmail.com 949-385-2436

halx commented 8 years ago

The interface i.e. inputs and outputs. At the moment P is passed in which is, frankly, a rather "heavy" data structure. It also seems to be (ab?)used to also contain additional data items beyond what arparse puts in there. The current code heavily relies on this data structure to be present globally so it is in essence another global. I would tend to only pass around data that is really needed. Further cleanup of the code and separation into smaller units will make this obvious.

On the output side the code is mostly where it should be. But we will need a mechanism to report back the parsers "capabilities". Maybe a separate data structure that is compared to other data times. e.g. P.methods to determine what analysis can and should be done.

Overall, I would think this is not high priority as it will become clearer during refactoring what needs to be done. The new code will need heavy testing before a merge and release. I tend to only test the AMBER side because I am most familiar with it but I will have a look through the example date as well to see if I haven't broken the code in an obvious way.

BTW, what is the need to take extras steps for the Desmond parser? Looking into uncorrelate() it seems there is effectively only one line of code different. Can't this be done by the parser itself?

halx commented 8 years ago

I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user

davidlmobley commented 8 years ago

That sounds reasonable. But I should investigate a bit on HOW the units should be reported back. For example, there is a units module used in openmm and some other places which actually can make objects CARRY units in a pretty convenient way. But I need to investigate whether having these units attached to return data would slow things down or not.

Otherwise I agree with you though.

On Thu, Jan 7, 2016 at 8:01 AM, Hannes Loeffler notifications@github.com wrote:

I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169707154 .

David Mobley dmobley@gmail.com 949-385-2436

halx commented 8 years ago

Following the duscussion on https://github.com/pandegroup/openmm/issues/680 it doesn't look to me that the current maintainer is much interested in making this module available in a sensible way (i.e. standalone). Pint has been suggested in the thread. I'll take a look at that one and see if it is interesting enough.

On 8 January 2016 at 01:29, David L. Mobley notifications@github.com wrote:

That sounds reasonable. But I should investigate a bit on HOW the units should be reported back. For example, there is a units module used in openmm and some other places which actually can make objects CARRY units in a pretty convenient way. But I need to investigate whether having these units attached to return data would slow things down or not.

Otherwise I agree with you though.

On Thu, Jan 7, 2016 at 8:01 AM, Hannes Loeffler notifications@github.com wrote:

I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user

— Reply to this email directly or view it on GitHub < https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169707154

.

David Mobley dmobley@gmail.com 949-385-2436

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169861131 .

halx commented 8 years ago

I had a quick look into pint and it looks workable. Possible downsides: this approach means replacing all 'raw' values with unit objects. This could mean more memory and also being slower. By how much I don't know but memory was a concern of Pavel's.

On 8 January 2016 at 10:15, Hannes Loeffler hannes.loeffler@gmail.com wrote:

Following the duscussion on https://github.com/pandegroup/openmm/issues/680 it doesn't look to me that the current maintainer is much interested in making this module available in a sensible way (i.e. standalone). Pint has been suggested in the thread. I'll take a look at that one and see if it is interesting enough.

On 8 January 2016 at 01:29, David L. Mobley notifications@github.com wrote:

That sounds reasonable. But I should investigate a bit on HOW the units should be reported back. For example, there is a units module used in openmm and some other places which actually can make objects CARRY units in a pretty convenient way. But I need to investigate whether having these units attached to return data would slow things down or not.

Otherwise I agree with you though.

On Thu, Jan 7, 2016 at 8:01 AM, Hannes Loeffler <notifications@github.com

wrote:

I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user

— Reply to this email directly or view it on GitHub < https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169707154

.

David Mobley dmobley@gmail.com 949-385-2436

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169861131 .

davidlmobley commented 8 years ago

OK, thanks. I did put a follow-up question in that thread. I've used simtk.unit relatively recently but it sounds like that's a bad idea (I don't want to make this require simtk), so looking into pint probably makes sense.

On Fri, Jan 8, 2016 at 2:15 AM, Hannes Loeffler notifications@github.com wrote:

Following the duscussion on https://github.com/pandegroup/openmm/issues/680 it doesn't look to me that the current maintainer is much interested in making this module available in a sensible way (i.e. standalone). Pint has been suggested in the thread. I'll take a look at that one and see if it is interesting enough.

On 8 January 2016 at 01:29, David L. Mobley notifications@github.com wrote:

That sounds reasonable. But I should investigate a bit on HOW the units should be reported back. For example, there is a units module used in openmm and some other places which actually can make objects CARRY units in a pretty convenient way. But I need to investigate whether having these units attached to return data would slow things down or not.

Otherwise I agree with you though.

On Thu, Jan 7, 2016 at 8:01 AM, Hannes Loeffler < notifications@github.com> wrote:

I think I have an idea how units should be handled. Essentially, the parsers fill up dhdlt and u_klt and report back what units they are in. The parsers should do no further conversion but will need to also report the temperature to compute beta. All conversion are then done in the main code as required and requested by the user

— Reply to this email directly or view it on GitHub <

https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169707154

.

David Mobley dmobley@gmail.com 949-385-2436

— Reply to this email directly or view it on GitHub < https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169861131

.

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-169954135 .

David Mobley dmobley@gmail.com 949-385-2436

halx commented 8 years ago

I have addressed most of these except for th unit problem. The harder part will now be to separate the current functions into more useful chunks (probably by means of analysis method and separte plotting from non-plotting functionality) and a library.

I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code?

davidlmobley commented 8 years ago

I have addressed most of these except for th unit problem. The harder part will now be to separate the current functions into more useful chunks (probably by means of analysis method and separte plotting from non-plotting functionality) and a library.

Great. Splitting in the way you describe also makes sense, I think.

I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code?

I am not totally convinced that it does either, though I think in general I'll probably try to be including these from scratch when I write new code that will be dealing with multiple sets of units - simply because having everything carry units makes it SO MUCH easier to make sure you have appropriate units when/where needed.

halx commented 8 years ago

My main worry was that this wouldn't work well with the numpy arrays that we use now. A numpy array can only contain certain data types (a "raw" number) and wrapping an array would cost some time and effort. Also, that would have multiplied memory storage even if it was possible.

I should have taken a closer look though and read http://pint.readthedocs.org/en/0.6/numpy.html . So you can actually wrap a numpy array together with a unit and do all the stuff you can do with with scalars! Ok, maybe you cannot do any transformation of an array you can do with numpy but simple unit conversion is possible. So after all that doesn't look so bad.

On 13 January 2016 at 16:59, David L. Mobley notifications@github.com wrote:

I have addressed most of these except for th unit problem. The harder part will now be to separate the current functions into more useful chunks (probably by means of analysis method and separte plotting from non-plotting functionality) and a library.

Great. Splitting in the way you describe also makes sense, I think.

I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code?

I am not totally convinced that it does either, though I think in general I'll probably try to be including these from scratch when I write new code that will be dealing with multiple sets of units - simply because having everything carry units makes it SO MUCH easier to make sure you have appropriate units when/where needed.

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-171363987 .

davidlmobley commented 8 years ago

Ah, that's a good point. Pint does indeed look promising.

On Wed, Jan 13, 2016 at 11:32 AM, Hannes Loeffler notifications@github.com wrote:

My main worry was that this wouldn't work well with the numpy arrays that we use now. A numpy array can only contain certain data types (a "raw" number) and wrapping an array would cost some time and effort. Also, that would have multiplied memory storage even if it was possible.

I should have taken a closer look though and read http://pint.readthedocs.org/en/0.6/numpy.html . So you can actually wrap a numpy array together with a unit and do all the stuff you can do with with scalars! Ok, maybe you cannot do any transformation of an array you can do with numpy but simple unit conversion is possible. So after all that doesn't look so bad.

On 13 January 2016 at 16:59, David L. Mobley notifications@github.com wrote:

I have addressed most of these except for th unit problem. The harder part will now be to separate the current functions into more useful chunks (probably by means of analysis method and separte plotting from non-plotting functionality) and a library.

Great. Splitting in the way you describe also makes sense, I think.

I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code?

I am not totally convinced that it does either, though I think in general I'll probably try to be including these from scratch when I write new code that will be dealing with multiple sets of units - simply because having everything carry units makes it SO MUCH easier to make sure you have appropriate units when/where needed.

— Reply to this email directly or view it on GitHub < https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-171363987

.

— Reply to this email directly or view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-171408097 .

David Mobley dmobley@gmail.com 949-385-2436

dotsdl commented 8 years ago

FWIW, seconding all the the points raised in this discussion. I think it's a huge boon if tools written in Python can actually be used directly from a Python session, but as currently written the core component of this module is only really usable as a command-line script. What's more, the code is extremely hard to follow, with e.g. functions that operate on names defined outside of their scope (alchemical_analysis.uncorrelate is one example).

Given this, I'm currently ripping out the things I need from it manually to get into a form I can actually use. I'll push the result to GitHub and perhaps that can help to move this issue along. I think this library addresses an important void, but could use some heavy refactoring.

nathanmlim commented 8 years ago

Hi @dotsdl, Before you go and refactor the code, there is currently a PR (#68) where @halx has done lots of work in refactoring the code. Nobody has had the chance to review it yet and thus hasn't been merged yet

davidlmobley commented 8 years ago

@dotsdl - I'm very on board with major changes; basically there's a ton that needs to be done along those lines and unfortunately I haven't had anyone in the group who is actively doing organization/improvements/maintenance right now (though @limn1 will probably be getting involved). @halx has done a lot that we didn't get to review/discuss yet. Perhaps a call would be warranted?

Nobody has had the chance to review it yet and thus hasn't been merged yet

Elaborating - I've had a bit of a look but we haven't really tested much yet, and it raises larger architecture issues/refactoring issues that I haven't been able to think through.

Maybe we should do a separate issue to think through the best ways to do this. Or we could schedule a call. Interested in being involved? @halx , are you still thinking about this and interested in being involved?

My group doesn't have enough real "code design" experience, I think, yet, so I need to involve outside expertise. The first version of this that we "released" was way too much done just by one student who didn't have the needed expertise.

@mrshirts has also mentioned that he's interested in being more involved going forward.

davidlmobley commented 8 years ago

@dotsdl - do please give us what you end up with regardless, as even if you're not involved in the larger "where do we go from here" discussion, at least seeing what you do/having your summary of it will help us figure out how to do this best.

Thanks.

On Wed, Jun 15, 2016 at 1:57 PM, Nathan Lim notifications@github.com wrote:

Hi @dotsdl https://github.com/dotsdl, Before you go and refactor the code, there is currently a PR (#68 https://github.com/MobleyLab/alchemical-analysis/pull/68) where @halx https://github.com/halx has done lots of work in refactoring the code. Nobody has had the chance to review it yet and thus hasn't been merged yet

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-226317770, or mute the thread https://github.com/notifications/unsubscribe/AGzUYb2X-5sFETJbX73C9yq49Ped0PlFks5qMGc7gaJpZM4G0Vg5 .

David Mobley dmobley@gmail.com 949-385-2436

mrshirts commented 8 years ago

Hi, all- I'm definitely happy to help review code, especially at the physical correctness and functionality level. I'm not as experienced in larger architecture levels.

I still not convinced that the tool needs a unit managment via a specialised module. Do you expect future functionality that is more than energy conversion at one single point in the code?

I don't think a general unit manager will be needed. There is a very low probability of ever needing anything other than energy units of either kJ/mol, kcal/mol, or kT. Something specialized for this purpose should be fine.

dotsdl commented 8 years ago

@davidlmobley I'm happy to be involved in this process. I think converging on a common codebase for doing low-level things like parsing output files from different MD engines and extracting the correct quantities for FEP/TI is important for progress in the larger field, since these should be entirely solvable problems. For my needs at the moment I only need to extract gromacs outputs, so I'll be focused on that.

I do have more of a software engineering focus, so I'm happy to help in terms of library architecture. As I said, I'll push the library that results from my work to a repo on GitHub and we'll see if a path forward emerges from it. We'll also then have something concrete to iterate on. That work will definitely require review from @davidlmobley and @mrshirts for physical correctness, since I'm less well-versed in caveats/pitfalls of alchemical free energy calculations.

halx commented 8 years ago

Yes, I have started with some light-weight changes in the code and I would think we should use this as a basis for discussion.

davidlmobley commented 8 years ago

@dotsdl , @halx - I've had this on hold for a while because of a super crazy summer (including a new baby, a visitor on sabbatical, etc.) and current grant proposal deadlines. But I'll be able to deal a bit with it again after the first week in October. Could we talk around that time to develop a better plan for the future, perhaps also with @mrshirts ?

dotsdl commented 8 years ago

@davidlmobley absolutely. I'm in the midst of writing my dissertation (defending Nov. 4), so of limited use on development fronts. But been working on a library called tentatively alchemlyb that contains the machinery I used to do data munging in Python from GROMACS output and do MBAR and TI with what appear to be best-practices.

I pushed this library to GitHub today, but the API is far from stable, as it's mostly things I built over the course of the past couple of months. Starting November I and two others in our lab will be employed to get this library in a form that is easy for everyone to use, complete with tests, docs, etc., as we have a need for a central library of solid implementations for the different alchemistry projects we are doing ourselves. We will definitely be interested in both you and @mrshirts input to ensure our implementations are scientifically and statistically sound.

We hope to keep the library simple and straightforward enough that it can be used both interactively and as part of larger postprocessing pipelines. The goal is simplicity and clarity. Should we set up a time to discuss general needs more directly?

davidlmobley commented 8 years ago

@dotsdl - this actually sounds really interesting. I'd been thinking to some extent that what this really needs is to be re-framed into a library format rather than a stand-alone tool which attempts to do everything and keeps having to get extended to fit with different codes. There might then still be a front end which would interact with different tools...

Maybe move this to e-mail so we can schedule, looping in @mrshirts and @halx ?

dotsdl commented 8 years ago

@davidlmobley happy to! You can reach me at dotsdl@gmail.com. Feel free to shoot me a message and we'll get rolling.

halx commented 8 years ago

Yes, let's got with email to arrange a discussion in October.

halx commented 7 years ago

Looks like I haven't paid much attention to this recently... Where are things standing at the moment?

davidlmobley commented 7 years ago

@dotsdl - any updates? Should we talk again? I'm still much more enthusiastic about the library approach you were talking about working on, but I haven't seen much action there, and obviously we'll need to do something fairly soon.

dotsdl commented 7 years ago

@davidlmobley I've been working on a proof-of-concept set of components and structure for alchemlyb this past week, and I'm hoping to be able to put both the API proposal that it demonstrates as well as a gist showing how it works in practice during the coming week. Speaking to @orbeckst, we also wanted to draft a white paper that clearly states the problem, aims, and our solution once consensus on the proposal is reached on the issue tracker.

I'll post the issue today and follow it up with the detailed proposal + gist in the coming days. Apologies for the delay.

mrshirts commented 7 years ago

Hi, all-

I think my todos are to modularize the calculation in alchemical-analysis and separate it from reading the information, as we want to try to reuse things as much as possible (since alchemical-analysis is fairly robust to a lot of data failure modes, if hard to work with). I will try to get to that next week -- grades should be submitted today.

On Wed, Dec 14, 2016 at 10:21 AM, David Dotson notifications@github.com wrote:

@davidlmobley https://github.com/davidlmobley I've been working on a proof-of-concept set of components and structure for alchemlyb https://github.com/alchemistry/alchemlyb this past week, and I'm hoping to be able to put both the API proposal that it demonstrates as well as a gist showing how it works in practice during the coming week. Speaking to @orbeckst https://github.com/orbeckst, we also wanted to draft a white paper that clearly states the problem, aims, and our solution once consensus on the proposal is reached on the issue tracker.

I'll post the issue today and follow it up with the detailed proposal + gist in the coming days. Apologies for the delay.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-267096348, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE31GrJ1t3TVCXCGD4g_fx3u28FNFDRks5rICWVgaJpZM4G0Vg5 .

halx commented 7 years ago

The parsers have already been separated in the https://github.com/MobleyLab/alchemical-analysis/tree/refactor branch . All that needs to be done now is to define the new interface and contract.

On 14 December 2016 at 17:58, Michael Shirts notifications@github.com wrote:

Hi, all-

I think my todos are to modularize the calculation in alchemical-analysis and separate it from reading the information, as we want to try to reuse things as much as possible (since alchemical-analysis is fairly robust to a lot of data failure modes, if hard to work with). I will try to get to that next week -- grades should be submitted today.

On Wed, Dec 14, 2016 at 10:21 AM, David Dotson notifications@github.com wrote:

@davidlmobley https://github.com/davidlmobley I've been working on a proof-of-concept set of components and structure for alchemlyb https://github.com/alchemistry/alchemlyb this past week, and I'm hoping to be able to put both the API proposal that it demonstrates as well as a gist showing how it works in practice during the coming week. Speaking to @orbeckst https://github.com/orbeckst, we also wanted to draft a white paper that clearly states the problem, aims, and our solution once consensus on the proposal is reached on the issue tracker.

I'll post the issue today and follow it up with the detailed proposal + gist in the coming days. Apologies for the delay.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48# issuecomment-267096348, or mute the thread https://github.com/notifications/unsubscribe-auth/AEE31GrJ1t3TVCXCGD4g_ fx3u28FNFDRks5rICWVgaJpZM4G0Vg5

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/MobleyLab/alchemical-analysis/issues/48#issuecomment-267106101, or mute the thread https://github.com/notifications/unsubscribe-auth/AH3aWgl3Y_DsdIsD6zpfrXsenr8PtC7qks5rIC5FgaJpZM4G0Vg5 .

davidlmobley commented 7 years ago

Sounds good, @dotsdl