Degiacomi-Lab / molearn

protein conformational spaces meet machine learning
https://degiacomi.org/software/molearn/
GNU General Public License v3.0
40 stars 11 forks source link

[JOSS] Installation #6

Closed RMeli closed 1 year ago

RMeli commented 1 year ago

Installation using mamba on a fresh environment (w/ Python 3.10) fails on macOS (M1) with the following error:

Could not solve for environment specs
The following package could not be installed
└─ molearn   is uninstallable because it requires
   └─ biobox  , which does not exist (perhaps a missing channel).

I think the issue stems from the fact that molearn's conda-forge feedstock is noarch, while biobox's confa-forge feedstock is linux-64 | win-64 | osx-64.

I'd suggest the following:


Note: I'm travelling with only access to macOS (M1). I'll be back to a workstation with a Linux distribution on Monday.

https://github.com/openjournals/joss-reviews/issues/5523

degiacom commented 1 year ago

Your observation that energy evaluation via OpenMM is not supported on Windows is correct, this is now also stated in the README. Thank you also for mentioning a problem with installation on macOS. Overall, I think the best solution would be to restrict the architecture to Linux only (@SCMusson is looking into this).

RMeli commented 1 year ago

Overall, I think the best solution would be to restrict the architecture to Linux only

Sounds very sensible to me. mamba install molearn on Ubuntu 23.04 seems to work as expected. Probably a simple note on the README.md is enough, but if you want to go the extra mile and restrict it on the conda-forge package too I think it would be great. You can probably use preprocessing selectors?

build:
   skip: True  # [win or osx]

Is there any reason why setup.py does not contain the dependencies under install_requires and the optional dependencies under the extra_requires fields? If you decide to restrict the conda-forge package, it might be worth adding platforms = ["Linux"] to setup.py too.

RMeli commented 1 year ago

Under the "Installation" section in the README.md and the documentation, I'd mention that modeller requires an academic license. I'd also mention which dependencies are automatically installed by conda install molearn -c conda-forge (the required ones, plus openmmtorchplugin) and state that most optional dependencies need to be manually installed.


Example of why I think the above is important. When trying to run the minimal_example notebook I got the following error:

Warning: importing 'simtk.openmm' is deprecated.  Import 'openmm' instead.
/home/rmeli/mambaforge/envs/joss-molearn/lib/python3.11/site-packages/molearn/trainers/sinkhorn_trainer.py:15: UserWarning: No module named 'geomloss'. Will not be able to use sinkhorn because geomloss is not installed.
  warnings.warn(f'{e}. Will not be able to use sinkhorn because geomloss is not installed.')
/home/rmeli/mambaforge/envs/joss-molearn/lib/python3.11/site-packages/molearn/scoring/__init__.py:15: UserWarning: No module named 'iotbx'. Will not be able to calculate Ramachandran score.
  warnings.warn(f"{e}. Will not be able to calculate Ramachandran score.")

---------------------------------------------------------------------------
ImportError                               Traceback (most recent call last)
Cell In[1], line 5
      3 from molearn.data import PDBData
      4 from molearn.models.foldingnet import AutoEncoder
----> 5 from molearn.analysis import MolearnAnalysis, MolearnGUI
      6 import torch
      7 from copy import deepcopy

File ~/mambaforge/envs/joss-molearn/lib/python3.11/site-packages/molearn/analysis/__init__.py:17
     14 __version__ = '2.0.1'
     15 __date__ = '$Date: 2023-05-16 $'
---> 17 from .analyser import MolearnAnalysis, as_numpy
     18 from .GUI import MolearnGUI
     19 from .path import oversample, get_path, get_path_aggregate

File ~/mambaforge/envs/joss-molearn/lib/python3.11/site-packages/molearn/analysis/analyser.py:21
     18 from modeller import *
     19 from modeller.scripts import complete_pdb
---> 21 from ..scoring import Parallel_DOPE_Score, Parallel_Ramachandran_Score
     22 from ..data import PDBData
     24 from ..utils import as_numpy

ImportError: cannot import name 'Parallel_Ramachandran_Score' from 'molearn.scoring' (/home/rmeli/mambaforge/envs/joss-molearn/lib/python3.11/site-packages/molearn/scoring/__init__.py)

I was puzzled by the No module named 'iotbx' warning and it took me a while to realise iotbx is a module provided by cctbx, which I forgot to install.

RMeli commented 1 year ago

plotly seems to be a dependency of the (optional) GUI too?

https://github.com/Degiacomi-Lab/molearn/blob/e16c0018e096a3903522941c8a72aa1defb9cead/src/molearn/analysis/GUI.py#L27

degiacom commented 1 year ago

I added a reference to plotly, mentioned that modeller requires academic license, and indicated that currently only Linux is supported. @SCMusson, while we could sort issues with dependencies with extra text in the README, I think it would make sense to update the conda installation, what do you think?

SCMusson commented 1 year ago

I agree, I have removed the noarch from the molearn-feedstock and now skip builds for [not linux]. I'll also look into setuptool dependencies.

RMeli commented 1 year ago

while we could sort issues with dependencies with extra text in the README, I think it would make sense to update the conda installation

I was suggesting the path of least resistance, but providing a full (with/without modeller?) conda/pip installation will definitely make things easier for the end user.

RMeli commented 1 year ago

While trying to install molearn using setup.py to make sure I have the latest version installed, I encountered the following issues:

I changed the to the following in setup.py:

-    install_requires = ["numpy", "pytorch", "biobox", "openmmtorchplugin"],
+    install_requires = ["numpy", "torch>=1.7", "biobox"],
+    extras_require={
+        "gui": ["MDAnalysis", "plotly", "nglview"],
+        "geomloss": ["geomloss"],
+    }

and this works for me

pip install -e ".[gui,geomloss]"

The OpenMM and DOPE plugins can't be supported by this method, but that's OK as long as this is clear in the documentation. This would allow to easily install a development version of molearn (with pip install -e .) and the other plugins can be installed/managed with conda.

SCMusson commented 1 year ago

I've had some issues with installing packages using pip. Primarily from version issues when installing beside packages from conda-forge. Additionally, having dependencies in setup.py means dependencies are specified in two separate places (the other being the conda-forge feedstock).

I'm thinking a nice clean solution would be rely on conda-forge for dependencies completely. We could have it so that if you wish to install from source it is possible to get all the correct dependencies using: mamba install -c conda-forge --only-deps molearn and then install molearn manually with setup.py: python -m pip install . -v

Do you think that would be an acceptable solution to you?

RMeli commented 1 year ago

@SCMusson yes, I think what you suggest is very sensible.

RMeli commented 1 year ago

Thanks for updating the README. LGTM!

65fa95e