bowman-lab / enspara

Modeling molecular ensembles with scalable data structures and parallel computing
https://enspara.readthedocs.io
GNU General Public License v3.0
33 stars 16 forks source link

Getting enspara installable on ARM machines #214

Closed sukritsingh closed 7 months ago

sukritsingh commented 2 years ago

Starting this discussion topic here to track any relevant info in 1 thread:

Problem: Enspara in its current form cannot be installed on ARM machines including Macbooks with M1 chips. This is because Numpy does not work on ARM machines below python 3.8, while Enspara only works up to python<3.8.

What are the updates needed to make enspara python3.8+ compatible? I remember that there are some hard-coded restrictions in setup.py - would removing those simply be sufficient? Prior to that update to setup.py the master branch worked fine for me up to python 3.9.

If it's just a matter of updating setup.py then @jlotthammer or I can submit PRs to fix this. Wanting to make sure there's no other necessary fixes.

lgsmith commented 2 years ago

I am using 3.9; I have been using a branch that didn't have the restrictions pulled into it, which is why I've been able to carry on using it. I was running into issues with executing the test suite, which I think is why the <3.8 restriction was put in place. The parts of enspara I've used seem to work, but it is annoying to not be able to run the tests (especially if you're working on something where you want to add or modify a test). So yeah, in a nutshell there's something actually wrong there. Sorry I can't be more insightful than that.

The issue with running the tests that I was getting was a complaint about not being able to import enspara properly from within the interpreter initiated for the tests because some of the relative imports don't work properly if you both have enspara installed and are within the enspara directory try and try to import it. I don't know how to get the tests to work properly but not change to a directory within the enspara tree, so that's kind of where I left things.

On Thu, Feb 3, 2022 at 4:00 PM Sukrit Singh @.***> wrote:

Starting this discussion topic here to track any relevant info in 1 thread:

Problem: Enspara in its current form cannot be installed on ARM machines including Macbooks with M1 chips. This is because Numpy does not work on ARM machines below python 3.8, while Enspara only works up to python<3.8.

What are the updates needed to make enspara python3.8+ compatible? I remember that there are some hard-coded restrictions in setup.py - would removing those simply be sufficient? Prior to that update to setup.py the master branch worked fine for me up to python 3.9.

If it's just a matter of updating setup.py then @jlotthammer https://github.com/jlotthammer or I can submit PRs to fix this. Wanting to make sure there's no other necessary fixes.

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/issues/214, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYKAOODTZQY7DYEU5GTUZL3IRANCNFSM5NQEQATA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you are subscribed to this thread.Message ID: @.***>

sukritsingh commented 2 years ago

Yeah I remember our discussion around this - beyond the tests everything else appears to function on Python 3.9 with Enspara seemed to run (at least as far as the functions I've used that are MSM related...)

Do you have any stack traces from the testing that we could use to figure out where the issues are? Is it just a matter of fixing this relative import issue or do we need to overhaul testing somehow? Deleting the <3.8 line in setup.py seems to work but your description makes me think it's a bandaid solution.

justinrporter commented 2 years ago

Deleting the <3.8 line in setup.py seems to work but your description makes me think it's a bandaid solution.

This is a bit worse that just a bandaid, it may actually quietly break parts of enspara. I don't know what changes were introduced in numpy 3.9, but they may break stuff. IIRC the numpy<3.8 was there because cython didn't support anything higher at the time, but I'm not sure.

The issue with running the tests that I was getting was a complaint about not being able to import enspara properly

Interesting. What were you invoking to run the tests? Did you do nosetests enspara or something else?

Unfortunately, nose has been in maintenance-only for some years and most people have switch to pytest; this is likely the best long-term solution but obviously could be some work. I worry that perhaps there has been a breaking change.

lgsmith commented 2 years ago

I used nosetests enspara. I had also seen something about its deprecation. I have no prior experience with either testing suite. I believe pytest is fairly common, so there may be a guide to follow if we need to translate over to it.

On Thu, Feb 3, 2022 at 8:20 PM Justin R. Porter @.***> wrote:

Deleting the <3.8 line in setup.py seems to work but your description makes me think it's a bandaid solution.

This is a bit worse that just a bandaid, it may actually quietly break parts of enspara. I don't know what changes were introduced in numpy 3.9, but they may break stuff. IIRC the numpy<3.8 was there because cython didn't support anything higher at the time, but I'm not sure.

The issue with running the tests that I was getting was a complaint about not being able to import enspara properly

Interesting. What were you invoking to run the tests? Did you do nosetests enspara or something else?

Unfortunately, nose has been in maintenance-only for some years and most people have switch to pytest; this is likely the best long-term solution but obviously could be some work. I worry that perhaps there has been a breaking change.

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/issues/214#issuecomment-1029578894, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYO6QJNU6FPWVXPGXKTUZMZXXANCNFSM5NQEQATA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

sukritsingh commented 2 years ago

I'm also recalling having some issues with loading in .h5 files generated using different versions of enspara (ie enspara on python3.7 could not load assignments.h5 files created by enspara on python3.9) - I believe that this was due to some changes in h5py we should also double check to ensure backward-compatibility with older files.

Ok so far from this discussion looks like there are three (broad) changes:

  1. Migrate enspara from nose to pytest (a large effort)
  2. Double check h5py use to ensure backwards compatibility (vague, admittedly, but involves only checking on one library/functionality of writing/reading files).
  3. Editing setup.py (this would be the final step when we know everything else works)

Anything else we should add to this list?

lgsmith commented 2 years ago

I don't know of other overt bugs or issues, so I think Sukrit's points are probably the right ones for now. Fixing that stuff probably would be good enough for a version number bump, too...

If we're talking about avoiding dependencies to potentially dying codebases I'd ask about MDTraj. I don't know how everyone feels about that, but we use it in places and my impression is that it's kind of on life-support at the moment. That might also be a really significant effort, but incompatibilities between MDTraj and Numpy versions are one of the reasons I upgraded everything to 3.9 consistent versions where it all (appears) to play nice.

If that's of interest I have thought about how we might supplant the requirement, but I recognize that the use of the Trajectory object is fairly ubiquitous so replacing it with a generic trajectory object might be a challenge.

On Fri, Feb 4, 2022 at 3:02 PM Sukrit Singh @.***> wrote:

I'm also recalling having some issues with loading in .h5 files generated using different versions of enspara (ie enspara on python3.7 could not load assignments.h5 files created by enspara on python3.9) - I believe that this was due to some changes in h5py we should also double check to ensure backward-compatibility with older files.

Ok so far from this discussion looks like there are three (broad) changes:

  1. Migrate enspara from nose to pytest (a large effort)
  2. Double check h5py use to ensure backwards compatibility (vague, admittedly, but involves only checking on one library/functionality of writing/reading files).
  3. Editing setup.py (this would be the final step when we know everything else works)

Anything else we should add to this list?

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/issues/214#issuecomment-1030346686, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYIZXHTWGTLT75PTGDTUZQ5H5ANCNFSM5NQEQATA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

justinrporter commented 2 years ago

MDTraj

You guys are closer than this to me, but I used MDTraj every day when I was actively working on my thesis. I can't imagine doing it without MDTraj--it its heyday it was a really great library. That said, it has been getting tougher and tougher to use. Is there an alternative?

the use of the Trajectory object is fairly ubiquitous

Yes, it is. I would be inclined to try to supplant this requirement with a different dependency rather than to go totally generic. Perhaps GROMACS' python bindings or perhaps mdanalysis which seems to have support from NumFocus and a more sustainable model for ensuring long-term support (ie money; which as an aside it might be wise to contribute to if we do go this direction).

Double check h5py use to ensure backwards compatibility (vague, admittedly, but involves only checking on one library/functionality of writing/reading files).

I don't remember any h5py problems, mostly cython problems, but if we drop MDTraj as a dependency this obviously goes away on its own.

Migrate enspara from nose to pytest (a large effort)

I'm hopeful this shouldn't be too bad. We didn't make terribly advanced use of nose, so most things should Just Work (TM) with pytest. If someone wants to take a crack at this, I can commit to being available on slack for questions over the next four weeks (since I'm on zoom capstone).

Thoughts?

lgsmith commented 2 years ago

In regard to the most important question: I am probably not the right tool for the job, but would be willing to wade into the testing stuff if there's no one else. Anyone having prior experience with nose would probably be better, because they'll understand 'what nose is doing' better so they can get pytest to mimic it.

For a package to provide a trajectory object I'd point to LOOS. That's my analysis library of choice, although full disclosure it's also what I 'came up' using because I did my PhD in Alan Grossfield's department. From what little I know of MDAnalysis, much more of it looks like numpy ops internally. That can seem like a good thing, but it can be less performant on large datasets than a library that's just written in C++, streams trajectories (for memory efficiency), and uses a lot of shared pointers, which is how LOOS works. On the other hand when you install loos there's a step where you have to compile the library. This is pretty easy, because they have a script that pulls all the dependencies into a conda env and then installs loos's python bindings and binaries there, so it's one line of code and you don't need root access to install the dependencies. My personal experience with hacking things into loos has been really positive, and there is a functionality to dump data (like frame coords) into numpy arrays and work with them the way you would in MDA. Obviously because of the python bindings that all these libraries have, postprocessing features skimmed out of a trajectory with sklearn or numpy/scipy is straightforward, so I don't think any of them have a huge interoperability advantage to other python ecosystem tools.

Money is always a problem for all of these software packages, ours included. I don't know about the particulars of the financial stability of MDAnalysis, but it seems like Oliver Beckstein has a fair amount of job security at this point. Do they have a full time developer? LOOS's situation is that a scientific programmer (Tod Romo) wrote it, supported by Alan. Alan maintains it, and is pretty responsive to things like github issues, but he does have resources to pay for some of Tod's time (the rest is paid for by the university of Rochester's general computing cluster, so he still works out of Alan's lab), so major improvements happen periodically. For example there'll be a release soon where they switch their build system from scons to cmake, which is a fairly significant change that was possible because Tod is around to do it.

On Sat, Feb 5, 2022 at 9:58 AM Justin R. Porter @.***> wrote:

MDTraj

You guys are closer than this to me, but I used MDTraj every day when I was actively working on my thesis. I can't imagine doing it without MDTraj--it its heyday it was a really great library. That said, it has been getting tougher and tougher to use. Is there an alternative?

the use of the Trajectory object is fairly ubiquitous

Yes, it is. I would be inclined to try to supplant this requirement with a different dependency rather than to go totally generic. Perhaps GROMACS' python bindings or perhaps mdanalysis https://github.com/MDAnalysis/mdanalysis which seems to have support from NumFocus and a more sustainable model for ensuring long-term support (ie money; which as an aside it might be wise to contribute to if we do go this direction).

Double check h5py use to ensure backwards compatibility (vague, admittedly, but involves only checking on one library/functionality of writing/reading files).

I don't remember any h5py problems, mostly cython problems, but if we drop MDTraj as a dependency this obviously goes away on its own.

Migrate enspara from nose to pytest (a large effort)

I'm hopeful this shouldn't be too bad. We didn't make terribly advanced use of nose, so most things should Just Work (TM) with pytest. If someone wants to take a crack at this, I can commit to being available on slack for questions over the next four weeks (since I'm on zoom capstone).

Thoughts?

— Reply to this email directly, view it on GitHub https://github.com/bowman-lab/enspara/issues/214#issuecomment-1030649530, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABAZVYONT47KTK7RH2ON7WTUZVCKTANCNFSM5NQEQATA . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

You are receiving this because you commented.Message ID: @.***>

justinrporter commented 2 years ago

How sure are we that MDTraj is on life support? PyEMMA is still depending on it, there was a release in November, and the latest commit was 5 days ago. In the golden age, at least, it was a really outstanding software package (in my opinion).

For a package to provide a trajectory object I'd point to LOOS

Since I came up using MDTraj, any experience with other packages will be greater than mine! All I can say is that I liked Beckstein's 'gromacswrapper' package but that's as much as I know about it, to be honest.

I looked at the LOOS documentation and, to be honest, I found it pretty confusing. For example, the intro page has "Although we primarily view LOOS as a development platform – a tool for making tools – it is distributed with a number of prebuilt applications" but has no discussion of library use of LOOS.

It sounds like it ships python bindings, but I can only tell that by noticing that there's something called "PyLOOS." My guess would be that it's C++ wrapped with SWIG or something?

I also don't see a setup.py in their github repo--is this right? Like, you can't pip install loos or pyloos? It sounds like the build is easy, but I wouldn't envy the position of someone maintaining a hard dependency that has a separate build step.

Have they tried to make it pip installable? I have to think that there's some reason they didn't do it, do you know?

That can seem like a good thing, but it can be less performant on large datasets than a library that's just written in C++, streams trajectories (for memory efficiency), and uses a lot of shared pointers, which is how LOOS works

It does sound like there could be some substantial advantages though! I like the sustainability and the performance (streaming can be a big deal).

In the end, a change of this size would need someone pretty gung-ho about doing it, because I worry that it will be pretty hard.

lgsmith commented 2 years ago

I'd been told that MDTraj was on life support by people in our group in part because Robert McGibbon had moved on and wasn't able to lead development anymore. If you think that's wrong, well, to be honest I don't really know either way. The Bowman group's connections to the broader Pande cohort are good, so I'd defer to someone who knows what is really going on with MDTraj internally. If MDTraj isn't actually a problem I don't want to worry about it. I thought it wasn't keeping up with numpy releases very well, which was creating install issues. I agree that it would take real work to change this dependency for enspara.

For loos dev tutorials you'd want to follow the link at the top of the main docs page, under quick links. That's probably a decent place to start, and the link tree has entries about key classes and working with the python bindings. There's also a youtube video, if that sort of thing appeals.

As far as install goes, the workflow they have to install into a particular python environment is through conda. The script would be conda_build.sh. Future plans are to host binaries on conda-forge, which is part of why they're switching to cmake. I don't know how soon that's going to happen, though.

lgsmith commented 2 years ago

Unfortunately, nose has been in maintenance-only for some years and most people have switch to pytest; this is likely the best long-term solution but obviously could be some work. I worry that perhaps there has been a breaking change.

Looking into this more, although it is a different library there is also Nose2. There's a script to convert to pytest from nose. Like I said above, I'm thinking about wading into this but I have no idea which'd be more preferable for the group or the project. If Nose2 isn't very popular, and pytest is very popular, maybe that's better? Looks like nose2 wraps the standard python unittest module, basically making it more convenient to use. Based on my (limited) googling pytest is probably more popular, and with open source there's safety in numbers...

If anyone has opinions about which direction to take here I'd love to hear them.

justinrporter commented 2 years ago

so I'd defer to someone who knows what is really going on with MDTraj internally

This is definitely not me! I just notice that it does seem to be actively maintained. And, as you say, it would be great to not have to change that dependency since it'd be a lot of work. Let's hold off on this for now?

I'm thinking about wading into this but I have no idea which'd be more preferable for the group or the project

I think this would be great! My vote would be for pytest since it's quite popular and has been for some years now. I have used both pytest and nose and thought they both were good (and quite similar). As the author of many of the original tests, and I can't think of any concepts that don't have a 1:1 mapping between pytest and nose.

sukritsingh commented 2 years ago

Let's hold off on this for now?

Agreed with keeping it with MDTraj for now. There's still some active maintaining going on, and a lot of the OpenMM infrastructure is also based on MDTraj, so this dependency will exist anyways for any OpenMM users looking to build MSMs (or doing anything combining the two).

My vote would be for pytest since it's quite popular and has been for some years now

Migrating to Pytest sounds good to me! I've used Pytest through the MolSSI Cookiecutter and found it easy to get up and going so I'd vote we go with the more populare/supported library.