Open GemmaTuron opened 7 months ago
@GemmaTuron, You're right. Zairachem installation steps need some cleanup.
Did you get an error for umap like?
File "/home/hellenah/anaconda3/envs/zairachem/lib/python3.7/site-packages/umap/__init__.py", line 36, in <module>
from importlib.metadata import version, PackageNotFoundError
ModuleNotFoundError: No module named 'importlib.metadata
Actually I had setup zairachem a few months ago. But to reproduce this, I have installed afresh.
I have realized we need to use atleast python3.8 instead of 3.7(the step of creating the zairachem env in install_linux.sh) With Python3.8, all works well.
But still, we might need to pin versions of some packages.
I can be on top of this if you want me to.
awesome @HellenNamulinda thanks for checking, that is the error I was having as well It woudl be ideal if we can use directly Python 3.10 and pin the versions for 3.10 instead, if feasible. I cannot work on this this week but the next one I can come back to this, let me know if you make any advancements in the meantime
I will work on this.
With Python3.10, Umap Learn version 0.5.5 works well. But autogluon V0.5.2 isn't supported. So, changing the version for autogluon.tabular, from 0.5.2 to 0.7.0 helps. PR: https://github.com/ersilia-os/zaira-chem/pull/38
I get the following error with the installation:
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
eosce 0.1.0 requires click==8.1.3, but you have click 8.0.4 which is incompatible.
ersilia 0.1.32 requires click<9.0.0,>=8.1.7, but you have click 8.0.4 which is incompatible.
ersilia 0.1.32 requires pandas<1.4.0,>=1.3.0; python_version >= "3.8", but you have pandas 1.5.3 which is incompatible.
tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.5 which is incompatible.
Successfully installed Levenshtein-0.25.0 SQLAlchemy-2.0.29 click-8.0.4 dataclasses-json-0.6.4 exmol
when I try to run it, it requires:
ModuleNotFoundError: No module named 'dask'
After pip installing dask, we still hit the click dependency
pkg_resources.DistributionNotFound: The 'click<=8.0.4,>=7.0' distribution was not found and is required by ray
The dependencies currently installed are this ones dependencies.txt
Yes, there is that issue of conflicting versions of pandas and click for different packages. forexample eosce 0.1.0
requires exactly one version of click.
Why click 8.0.4 is being installed: FLAML requiresray~=1.13,
(which installs click 8.0.4) and flaml won't work with higher versions of ray[tune].
Though I didn't except click versions to have significant impact, And pinning its version will break installation of major dependencies.
In my environment, dask isn't installed zairachem_requirements.txt. I had removed it from the requirements and zairachem works without it. Just that it is slow at fitting where fetching the hub models takes long.
Is there a specific command run that is leading toModuleNotFoundError: No module named 'dask'
Let me reinstall and see.
Sorry, pip freeze shows that dask is installed(zairachem_requirements.txt, though conda list doesn't.
I have reinstalled, same requirements.txt and install_linux.sh files as on github; zairachem_installation.txt
The CLI commands(zairachem) are working.
Hi @HellenNamulinda !
Does it work for you? On my end, if I use the current bash script to install ZairaChem, I am missing Dask. If we want to keep click at v8.0.4, we need to use dask==2023.10.1
But when doing so, we still run into the following dependency conflicts
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
eosce 0.1.0 requires click==8.1.3, but you have click 8.0.4 which is incompatible.
ersilia 0.1.32 requires click<9.0.0,>=8.1.7, but you have click 8.0.4 which is incompatible.
ersilia 0.1.32 requires pandas<1.4.0,>=1.3.0; python_version >= "3.8", but you have pandas 1.5.3 which is incompatible.
An option here is to move the eosce descriptors to use the Ersilia Model Hub implementation, as the rest of the descriptors do, as we have it as a model - this would be convenient but still leave us with the issue with Ersilia.
And get this error when trying to fit a model (actually, zairachem --help does not crash, which might be the command you have tried.) It seems mellody is not working properly?
Traceback (most recent call last):
File "/home/gturon/anaconda3/envs/zairachem2/bin/zairachem", line 33, in <module>
sys.exit(load_entry_point('zairachem', 'console_scripts', 'zairachem')())
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/click/core.py", line 1128, in __call__
return self.main(*args, **kwargs)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/click/core.py", line 1053, in main
rv = self.invoke(ctx)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/click/core.py", line 1659, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/click/core.py", line 1395, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/click/core.py", line 754, in invoke
return __callback(*args, **kwargs)
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/cli/commands/fit.py", line 124, in fit
s.setup()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/training.py", line 230, in setup
self._standardize()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/training.py", line 156, in _standardize
Standardize(os.path.join(self.output_dir, DATA_SUBFOLDER)).run()
File "/home/gturon/github/ersilia-os/zaira-chem/zairachem/setup/standardize.py", line 28, in run
dfm = pd.read_csv(self.tuner_filename)[
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/util/_decorators.py", line 211, in wrapper
return func(*args, **kwargs)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/util/_decorators.py", line 331, in wrapper
return func(*args, **kwargs)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 950, in read_csv
return _read(filepath_or_buffer, kwds)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 605, in _read
parser = TextFileReader(filepath_or_buffer, **kwds)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1442, in __init__
self._engine = self._make_engine(f, self.engine)
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/io/parsers/readers.py", line 1735, in _make_engine
self.handles = get_handle(
File "/home/gturon/anaconda3/envs/zairachem2/lib/python3.10/site-packages/pandas/io/common.py", line 856, in get_handle
handle = open(
FileNotFoundError: [Errno 2] No such file or directory: '/home/gturon/github/ersilia-os/osm-models/test/data/melloddy/results/results_tmp/standardization/T2_standardized.csv'
Hi @GemmaTuron, Yes, I agree with fetching eosce from the hub.
On my end, the split command works(split_zairachem.log), as well as fit command. While fitting, there are these 2 models(eos6m4j
and eos59rr
) which aren't not specified in the variable Ersilia Default Hub Models. So fitting was taking long because I hadn't fetched them prior.
The logs I saved for fit are quite large(>250MB
), I don't think it is to share them here.
2024-04-05 11:53:00.710598: I tensorflow/core/platform/cpu_feature_guard.cc:193] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
11:53:03 | DEBUG | Starting CLI
Results will be stored at model
11:53:04 | DEBUG | Values column ['activity']
Read new config file.
Default reference files from data/reference_set.csv loaded.
Read new key file.
Hashing unit test file /home/hellenah/MSc/model/data/melloddy/results/reference_set/T11.csv
Hashing /home/hellenah/zaira-chem/zairachem/tools/melloddy/config/example_parameters.json
Hashing /home/hellenah/zaira-chem/zairachem/tools/melloddy/config/example_key.json
Hashing version: 2.1.3
Done.
No reference hash given. Comparison of generated and reference hash keys will be skipped.
Hashing reference data finished after 0.90217614 seconds.
Start standardizing structures.
Check uniqueness of T2.
Sanity checks took 0.90851784 seconds.
Sanity checks passed.
Standardization took 25.279011 seconds.
Standardization done.
Read new config file.
Read new key file.
...
But during fitting, the descriptors were calculated, and models were trained, except for the molmap estimator whose models weren't fetched prior.
Unfortunately, I can't reproduce the error with mellody.
While the pip dependency conflicts(pandas and click) aren't causing any issues on my end, let me look more into this to resolve and harmonize the versions.
mm thanks @HellenNamulinda , I can't understand why we get different errors with the same install. I am on an Ubuntu 22.04 machine btw Can you paste here when you have time the list of dependencies in your env and I will try to see if there is any difference with mine?
Ok, I have worked on the issue and made the following changes:
I hope these changes work for all systems, I am on an Ubuntu 22.04. For reference, I've added the packages and their versions here
Note that we still have some dependency issues that do not break the code, so solution #30 would be cleaner
ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts.
autogluon-common 0.7.0 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-core 0.7.0 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-features 0.7.0 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
autogluon-tabular 0.7.0 requires pandas<1.6,>=1.4.1, but you have pandas 2.1.4 which is incompatible.
ersilia 0.1.32 requires pandas<1.4.0,>=1.3.0; python_version >= "3.8", but you have pandas 2.1.4 which is incompatible.
ray 2.2.0 requires protobuf!=3.19.5,>=3.15.3, but you have protobuf 3.19.5 which is incompatible.
tensorboardx 2.6.2.2 requires protobuf>=3.20, but you have protobuf 3.19.5 which is incompatible.
@JHlozek @HellenNamulinda
Please have a look and let me know if these changes also work for you. I'd like to ping Success here as well but I don't have his github handle. Once we can confirm this works, and whether we want to pin the versions indicated above, I will create the new stable release of ZairaChem
@GemmaTuron
I get the same set of conflicts as you do above.
However, please note that the Ersilia Compound Embedding Lite version should be pinned to 'v.0.2.0' instead of 'v0.2.0' (or perhaps rather re-tag the compound embedding number to be consistent with its first version number format?)
I am having also having an issue with the mordred fingerprints, but that occurred before these changes and when I disable it, the rest of the code works just fine. I'm going to do another build on my workstation later tonight to test it all again.
Thanks @GemmaTuron. However, the requirements.txt file seems to still be using git-lfs .
I got an installation error for ersilia-compound-embedding.
Collecting git+https://github.com/ersilia-os/compound-embedding-lite.git@v0.2.0
Cloning https://github.com/ersilia-os/compound-embedding-lite.git (to revision v0.2.0) to /tmp/pip-req-build-mjbd_nc3
Running command git clone --filter=blob:none --quiet https://github.com/ersilia-os/compound-embedding-lite.git /tmp/pip-req-build-mjbd_nc3
WARNING: Did not find branch or tag 'v0.2.0', assuming revision or ref.
Running command git checkout -q v0.2.0
error: pathspec 'v0.2.0' did not match any file(s) known to git
error: subprocess-exited-with-error
× git checkout -q v0.2.0 did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× git checkout -q v0.2.0 did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
I decided to test by removing the v0.2.0 and it installed the latest esoce version.
Aside correcting the tag for eosce, the rest(such as split and fit commands) work.
Am also using Ubuntu 22.04.
Thanks, sorry my bad on the eosce version, I just fixed it thanks @JHlozek ! The mordred might be having issues as reported on #36, but we are fixing them this week
No worries - I think the new ZairaChem build is good to go then. 👍
Thanks for the info and update on mordred - that is indeed the issue I saw.
I'll wait for Success confirmation that the installation works for him and then close this issue @HellenNamulinda I have removed the requirements.txt from git-lfs, not very straightforward but we could actually do this across the whole git lfs history, see #40
@sucksido please confirm that the installation is working in your system so we can close this issue
Describe the bug
I think we might need to pin Umap Learn version 0.5.3, as it now automatically installs version 0.5.5 and it gives an error. Have not had time to look into it in detail so I might be wrong... but definitely something we should revise
To Reproduce Steps to reproduce the behavior:
Desktop (please complete the following information):