ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
218 stars 147 forks source link

🐛 Bug: Error when running prediction from docs #803

Closed jeremycheminf closed 1 month ago

jeremycheminf commented 1 year ago

Describe the bug.

When following the documentation and installing fresh env, the prediction returns error TypeError: object of type 'NoneType' has no len(). Fetching model itself is working.

Describe the steps to reproduce the behavior

conda create -n ersilia python=3.10 conda activate ersilia python -m pip install isaura==0.1 git clone https://github.com/ersilia-os/ersilia.git cd ersilia pip install -e . ersilia fetch retrosynthetic-accessibility ersilia example retrosynthetic-accessibility -n 5 -f my_molecules.csv ersilia run -i my_molecules.csv -o my_predictions.csv

Outcome:

Traceback (most recent call last): File "/home/jeremy/mambaforge/envs/ersilia/bin/ersilia", line 8, in <module> sys.exit(cli()) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1157, in __call__ return self.main(*args, **kwargs) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, **ctx.params) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(*args, **kwargs) File "/home/jeremy/ersilia/ersilia/cli/commands/__init__.py", line 22, in wrapper return func(*args, **kwargs) File "/home/jeremy/ersilia/ersilia/cli/commands/run.py", line 34, in run result = mdl.run(input=input, output=output, batch_size=batch_size) File "/home/jeremy/ersilia/ersilia/core/model.py", line 144, in _method return self.api(api_name, input, output, batch_size) File "/home/jeremy/ersilia/ersilia/core/model.py", line 335, in api if self._do_cache_splits(input=input, output=output): File "/home/jeremy/ersilia/ersilia/core/model.py", line 320, in _do_cache_splits self.tfr = TabularFileReader( File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 570, in __init__ self._standardize() File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 574, in _standardize tfss = TabularFileShapeStandardizer( File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 409, in __init__ self.read_input_columns() File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 321, in read_input_columns if len(h) == 1: TypeError: object of type 'NoneType' has no len()

Expected behavior.

Prediction with score for the molecule. I managed to run the code on google colab using the notebook provided on the documentation, but locally I can not get the tool to work.

Screenshots.

No response

Operating environment

WSL2 - Ubuntu 22.04.2 LTS

Additional context

No response

GemmaTuron commented 1 year ago

Hi @jeremycheminf

Thanks for this report. Can you show an example of the -csv file you are getting from the example function? and is the path to the file correct in the -i?

jeremycheminf commented 1 year ago

The file is like this CC[C@H](C)[C@H](NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](CCC(O)=O)NC(=O)[C@H](Cc1ccccc1)NC(=O)[C@H](CC(O)=O)NC(=O)CNC(=O)[C@H](CC(N)=O)NC(=O)CNC(=O)CNC(=O)CNC(=O)CNC(=O)[C@@H]1CCCN1C(=O)[C@H](CCCNC(N)=N)NC(=O)[C@@H]1CCCN1C(=O)[C@H](N)Cc1ccccc1)C(=O)N1CCC[C@H]1C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](CCC(O)=O)C(=O)N[C@@H](Cc1ccc(O)cc1)C(=O)N[C@@H](CC(C)C)C(O)=O CC(C)c1cc(nc(N)n1)-c1ccc(F)c2ccccc12 C[C@H](CCC(O)=O)[C@H]1CC[C@H]2[C@H]3[C@H](CC(=O)[C@]12C)[C@@]1(C)CCC(=O)C[C@H]1CC3=O [H][C@@]1(C[C@@](C)(OC)[C@@H](O)[C@H](C)O1)O[C@H]1[C@H](C)[C@@H](O[C@]2([H])O[C@H](C)C[C@@H]([C@H]2O)N(C)C)[C@](C)(O)C[C@@H](C)N(CCC)C[C@H](C)[C@@H](O)[C@](C)(O)[C@@H](CC)OC(=O)[C@@H]1C Cc1cn([C@H]2C[C@H](F)[C@@H](CO)O2)c(=O)[nH]c1=O and yes the file is in the path. If I try with 'ersilia api run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)N+[O-]"' I also get the same error

GemmaTuron commented 1 year ago

Hi @jeremycheminf

This looks like the right file, testing with one molecule is always good practice, thanks. I just noticed you are not serving the model before trying to run predictions? that might be the cause of the issues Please make sure after fetch you bring the model alive by ersilia serve <modelname>

Other things to look at:

jeremycheminf commented 1 year ago

Thank you I added the server line and re-fetched from docker: image I attached the log file out.log

GemmaTuron commented 1 year ago

Hi @jeremycheminf

It seems something went amiss when you set up Ersilia locally. Can I refer you to a very similar issue solved in #820 that some new interns are working on? It might provide the answer!

GemmaTuron commented 1 year ago

Hi @jeremycheminf

Can you share with me the packages listed in the ersilia env with conda list ? I want to see if there are any dependencies that might be causing the clash. I do not have a WSL system to test right now and help debugging thanks!

jeremycheminf commented 1 year ago

Hi This is the list, I have yet to try https://github.com/ersilia-os/ersilia/issues/820 which has the same error and same idea around changing some of the code. So that should work when I try.

packages in environment at /home/jeremy/mambaforge/envs/ersilia:

#

Name Version Build Channel

_libgcc_mutex 0.1 conda_forge conda-forge _openmp_mutex 4.5 2_gnu conda-forge attrs 21.4.0 pypi_0 pypi boto3 1.28.52 pypi_0 pypi botocore 1.31.52 pypi_0 pypi bzip2 1.0.8 h7f98852_4 conda-forge ca-certificates 2023.7.22 hbcca054_0 conda-forge certifi 2023.7.22 pypi_0 pypi charset-normalizer 3.2.0 pypi_0 pypi chembl-webresource-client 0.10.8 pypi_0 pypi click 8.1.7 pypi_0 pypi docker 6.1.3 pypi_0 pypi dockerfile-parse 2.0.1 pypi_0 pypi easydict 1.10 pypi_0 pypi emoji 2.8.0 pypi_0 pypi ersilia 0.1.27 pypi_0 pypi h5py 3.7.0 pypi_0 pypi idna 3.4 pypi_0 pypi inputimeout 1.0.4 pypi_0 pypi isaura 0.1 pypi_0 pypi itsdangerous 2.1.2 pypi_0 pypi jmespath 1.0.1 pypi_0 pypi ld_impl_linux-64 2.40 h41732ed_0 conda-forge libffi 3.4.2 h7f98852_5 conda-forge libgcc-ng 13.2.0 h807b86a_2 conda-forge libgomp 13.2.0 h807b86a_2 conda-forge libnsl 2.0.0 h7f98852_0 conda-forge libsqlite 3.43.0 h2797004_0 conda-forge libuuid 2.38.1 h0b41bf4_0 conda-forge libzlib 1.2.13 hd590300_5 conda-forge loguru 0.6.0 pypi_0 pypi ncurses 6.4 hcb278e6_0 conda-forge numpy 1.26.0 pypi_0 pypi openssl 3.1.3 hd590300_0 conda-forge packaging 23.1 pypi_0 pypi pillow 10.0.1 pypi_0 pypi pip 23.2.1 pyhd8ed1ab_0 conda-forge pyairtable 1.5.0 pypi_0 pypi python 3.10.12 hd12c33a_0_cpython conda-forge python-dateutil 2.8.2 pypi_0 pypi pyyaml 6.0.1 pypi_0 pypi rdkit-pypi 2022.9.5 pypi_0 pypi readline 8.2 h8228510_1 conda-forge requests 2.31.0 pypi_0 pypi requests-cache 0.7.5 pypi_0 pypi s3transfer 0.6.2 pypi_0 pypi setuptools 68.2.2 pyhd8ed1ab_0 conda-forge six 1.16.0 pypi_0 pypi tk 8.6.12 h27826a3_0 conda-forge tqdm 4.66.1 pypi_0 pypi tzdata 2023c h71feb2d_0 conda-forge url-normalize 1.4.3 pypi_0 pypi urllib3 1.26.16 pypi_0 pypi validators 0.21.2 pypi_0 pypi websocket-client 1.6.3 pypi_0 pypi wheel 0.41.2 pyhd8ed1ab_0 conda-forge xz 5.2.6 h166bdaf_0 conda-forge

GemmaTuron commented 1 year ago

@jeremycheminf

Yes, related to that, changing the code will bypass the error but then the predictions will return null - so don't go that route. It is an issue with installation on WSL we have not been able to pinpoint but I'm working on this, will let you know!

GemmaTuron commented 1 year ago

Also, in case it is helpful aside from Google Colab you can use GitHub codespaces as we discussed, simply go on the right hand of the /ersilia repository, click <> Code and select the CodeSpaces option. This will set up a Codespace where ersilia is installed (you can check with the command ersilia --help) and you can fetch, serve and run models. Please note that Codespaces use the individual free tier of GitHub users (60h/month) so make sure to terminate it once done. We haven't yet written extensive documentation for that since we are trying out its functionalities still

GemmaTuron commented 1 year ago

Hi @jeremycheminf

I think we have identified the source of the error - it is due to compatibility with Isaura, which is our backend for caching predictions (mostly only needed if you use the models intensively) - can you delete the conda environment and try the installation again without installing Isaura? We should remove that from the docs as it is not a requirement, only a nice-to-have. Kudos to @carcablop for identifying the source of error - we are now working on fixing it!

jeremycheminf commented 1 year ago

Hi This worked with also using ersilia -v run -i "CCCC" However the command ersilia -v run -i my_molecules.csv -o my_predictions.csv gave this error 19:41:18 | DEBUG | Getting session from /home/jeremy/eos/session.json 19:41:18 | DEBUG | Getting session from /home/jeremy/eos/session.json 19:41:18 | WARNING | Lake manager 'isaura' is not installed! We strongly recommend installing it to store calculations persistently 19:41:18 | ERROR | Isaura is not installed! Calculations will be done without storing and reading from the lake, unfortunately. 19:41:19 | DEBUG | Is fetched: True 19:41:19 | DEBUG | Schema available in /home/jeremy/eos/dest/eos2r5a/api_schema.json 19:41:19 | DEBUG | Setting AutoService for eos2r5a 19:41:19 | INFO | Service class provided 19:41:19 | DEBUG | Using port 49345 19:41:19 | DEBUG | Starting Docker Daemon service 19:41:19 | DEBUG | Creating temporary folder /tmp/ersilia-hgf1up2i and mounting as volume in container 19:41:19 | DEBUG | Image ersiliaos/eos2r5a:latest is available locally 19:41:19 | DEBUG | Using port 37459 19:41:19 | DEBUG | Starting Docker Daemon service 19:41:19 | DEBUG | Creating temporary folder /tmp/ersilia-enu0uxmv and mounting as volume in container 19:41:19 | DEBUG | Reading card from eos2r5a 19:41:19 | DEBUG | Trying to get metadata from: /home/jeremy/eos/dest/eos2r5a 19:41:20 | DEBUG | Reading shape from eos2r5a 19:41:20 | DEBUG | Trying to get metadata from: /home/jeremy/eos/dest/eos2r5a 19:41:21 | DEBUG | Input Shape: Single 19:41:21 | DEBUG | Input type is: compound 19:41:21 | DEBUG | Input shape is: Single 19:41:21 | DEBUG | Importing module: .types.compound 19:41:21 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs 19:41:21 | DEBUG | InputShapeSingle shape: Single 19:41:21 | DEBUG | Expected number: 1 19:41:21 | DEBUG | Entity is list: False 19:41:21 | DEBUG | Resolving columns 19:41:21 | DEBUG | Number of columns seems to be 1: assuming input is the only column: {'input': [0], 'key': None} 19:41:21 | DEBUG | Candidate header is ['CCC@HC@HNC(=O)C@HNC(=O)C@HNC(=O)C@HNC(=O)CNC(=O)C@HNC(=O)CNC(=O)CNC(=O)CNC(=O)CNC(=O)[C@@H]1CCCN1C(=O)C@HNC(=O)[C@@H]1CCCN1C(=O)C@HCc1ccccc1)C(=O)N1CCC[C@H]1C(=O)NC@@HC(=O)NC@@HC(=O)NC@@HC(=O)NC@@HC(O)=O'] 19:41:21 | DEBUG | Matching for input is [0] 19:41:21 | DEBUG | Has header False 19:41:21 | DEBUG | Schema {'input': [0], 'key': None} Traceback (most recent call last): File "/home/jeremy/mambaforge/envs/ersilia/bin/ersilia", line 8, in sys.exit(cli()) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/jeremy/mambaforge/envs/ersilia/lib/python3.10/site-packages/click/core.py", line 783, in invoke return __callback(args, *kwargs) File "/home/jeremy/ersilia/ersilia/cli/commands/init.py", line 22, in wrapper return func(args, **kwargs) File "/home/jeremy/ersilia/ersilia/cli/commands/run.py", line 34, in run result = mdl.run(input=input, output=output, batch_size=batch_size) File "/home/jeremy/ersilia/ersilia/core/model.py", line 144, in _method return self.api(api_name, input, output, batch_size) File "/home/jeremy/ersilia/ersilia/core/model.py", line 335, in api if self._do_cache_splits(input=input, output=output): File "/home/jeremy/ersilia/ersilia/core/model.py", line 320, in _do_cache_splits self.tfr = TabularFileReader( File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 570, in init self._standardize() File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 574, in _standardize tfss = TabularFileShapeStandardizer( File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 409, in init self.read_input_columns() File "/home/jeremy/ersilia/ersilia/io/readers/file.py", line 321, in read_input_columns if len(h) == 1: TypeError: object of type 'NoneType' has no len()

So header comes back again, but I'll try to fix it with having an actual header in the file

jeremycheminf commented 1 year ago

Adding SMILES at the top of the csv file worked to get all the predictions. So looks like on wsl the csv must have a header

miquelduranfrigola commented 10 months ago

Hi @GemmaTuron - is the issue resolved?

GemmaTuron commented 10 months ago

It is if you add the header on the .csv file for WSL - we can close this issue but maybe we should make sure this is specified in the documentation if it is not

DhanshreeA commented 8 months ago

@GemmaTuron maybe I need to check this, but shouldn't every csv (whether on WSL, Linux, or Mac) should have a header? I'm not sure if it's a WSL specific issue, or even ersilia specific. I'll check this and get back. Whatever is the case, we should update the documentation and close this issue.

GemmaTuron commented 8 months ago

Hi @DhanshreeA

They should all have a header, but in case they don't, Ersilia should be able to process them? I don't know, I'd go for making all .csv files have a header

GemmaTuron commented 1 month ago

Hi @DhanshreeA and @miquelduranfrigola

Can we clarify if Ersilia requires the passing of a .csv with a header or not ?

miquelduranfrigola commented 1 month ago

In principle, Ersilia does not require a header. It automatically inspects the input file. Please let me know if we've lost this functionality for some reason.

DhanshreeA commented 1 month ago

I'll check this and report.

GemmaTuron commented 1 month ago

The models work without header and with header and they produce the right output in my Ubuntu system.

I'll close this issue and if it arises again from any user we will revisit it