ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
220 stars 148 forks source link

🐛 Bug: eos3ae7 repeatedly fails to fetch #343

Closed Cee-tech21 closed 1 year ago

Cee-tech21 commented 2 years ago

Describe the bug.

fetching of eos3ae7 repeatedly fails with the following error message logged:

"Model API eos3ae7:predict did not produce an output"

Describe the steps to reproduce the behavior

Run the following command: ersilia -v fetch eos3ae7 | tee -a eos3ae7_fetch.log 2>&1

Expected behavior.

After running the "fetch" command, the model eos3ae7 is meant to be downloaded from remote repository to local computer.

Screenshots.

eos3ae7_fetch.log

Operating environment

Linux Mint 19

Additional context

No response

Zainab-ik commented 2 years ago

@Cee-tech21 try it again while the internet is connected. Sometimes, could be due to internet break.

GemmaTuron commented 2 years ago

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the|tee -a I have seen it in #355 and #344

Jona-Bvunza commented 2 years ago

Hi @Cee-tech21 ! Was the error corrected ?, I have similar issues with my models

GemmaTuron commented 2 years ago

Hi @Cee-tech21 ! Was the error corrected ?, I have similar issues with my models

Hello @Jona-Bvunza , please check the Slack channel for a more in depth explanation, the fact that you get an "empty output error" might come from a very different issue, so please open your own issue and paste the log file

GemmaTuron commented 2 years ago

Hi @Cee-tech21 !

The model is working in my linux machine. I think the |tee command is not saving all the error log as we need it, so I can't see what's going on. Please try again and save directly without the|tee -a I have seen it in #355 and #344

I am seeing that this model is also using the sqlalchemy package, maybe linked to what we are seeing in #338. @Cee-tech21 can you do the same test as @femme-js? (check the version in the conda environment of the model, and try to run the model in colab)

@miquelduranfrigola do you think the problem might be in the sqlalchemy versions?

Cee-tech21 commented 2 years ago

Model has now been run on google colab but the same error noted in this issue is witnessed in google colab. chizi_e_cee-tech

Cee-tech21 commented 2 years ago

Since this model fails to fetch both on my local computer and on colab, I intend closing this issue with the presumption/conclusion that there's a problem preventing the model from being fetched.

GemmaTuron commented 2 years ago

@Cee-tech21 I can reproduce the same error, I need some time to check what can be the issue. Can you please leave the issue open but change title to "eos3ae7 fails at fetching time" I will add some tags to help us locate it. Mark the issue on excel and move on!

miquelduranfrigola commented 1 year ago

@GemmaTuron what is the current status of this?

Cee-tech21 commented 1 year ago

Hi all! Fetching this model here. Fetch still fails. Will update this post once fetch is successful.

GemmaTuron commented 1 year ago

@GemmaTuron what is the current status of this?

Hello @miquelduranfrigola

The issues tagged with "help wanted" and "model-bug" are models that consistenly encountered problems at fetch time. We will work with the Outreachy interns during the internship period in making sure they run consistently.

@Cee-tech21 let us know if you are trying again, thanks

Cee-tech21 commented 1 year ago

Hi @GemmaTuron, I have just tried to fetch model "eos3ae7" again. Fetching of eos3ae7 still fails.

miquelduranfrigola commented 1 year ago

Thanks @Cee-tech21 - we are compiling a list of problematic models and we will address them in one batch before Christmas. Will keep you posted.

paulinebanye commented 1 year ago

Hi @GemmaTuron @miquelduranfrigola

This model fails to fetch using the CLI and colab. It returns an EmptyOutputError

System

Windows 10

Conda version

conda 22.9.0

Pip version

pip 22.3.1

Python version

Python 3.7.13

SQLAlchemy version

Version: 1.3.24

Steps to reproduce the behavior

ersilia -v fetch eos3ae7 > eos3ae7.log 2>&1

error on CLI

error log - eos3ae7.log

error on colab

Attempts to resolve the error

Based on similar errors,

paulinebanye commented 1 year ago

Just a quick update regarding the status of this model. I continued working on #343 started on #369 as they both return the same EmptyOutputError but I came across issues with the dependencies.

Sqlalchemy sqlalchemy error

Bentoml bentoml error

GemmaTuron commented 1 year ago

Hi @pauline-banye If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine? Thanks!

paulinebanye commented 1 year ago

Hi @pauline-banye If you can paste the full error logs here it would be helpful. I assume other models work fine on your system? Are you on a WSL or a Ubuntu machine? Thanks!

Hi @GemmaTuron I am on a WSL machine. I have tested 3 of the models with issues eos3ae7, eos4tccc and eos1579 eos3ae7.log eos4tcc.log

I'm in the process of testing the models on colab as well. So far I have tested eos4tcc on colab and it returns an EmptyOutputError as well.

I would update you once I have tested the other models on colab

paulinebanye commented 1 year ago

Update @GemmaTuron @miquelduranfrigola. I was able to resolve the issue with my system not fetching any model.

Steps I took were:

I fetched the model multiple times and encountered errors relating to dependencies on different ocassions "no module named pandas", "no module named keras", "no module named tensorflow". Which was resolved by running:

The current error returned is ModuleNotFoundError: No module named 'keras.layers.recurrent' which I tried to resolve with pip install keras.layers.recurrent.

keras

eos3ae7.log

miquelduranfrigola commented 1 year ago

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

paulinebanye commented 1 year ago

Many thanks, @pauline-banye. This is extremely helpful and I really appreciate the great reporting. This looks like an issue related to Isaura, which now uses poetry to manage dependencies. I am testing it today and will keep you updated.

Thank you @miquelduranfrigola 😊. It would be updating the reports on the other two models I tested as well.

GemmaTuron commented 1 year ago

Hi,

Hoping to bring some extra information on this issue. I have installed WSL in my windows machine to make sure I can reproduce @pauline-banye settings. I have taken special care to ensure that the python path is set to the Anaconda python, so conda environments should be directed to the right place. Just to be clear, there is no Python installed outside Conda in the WSL system -- this could be a source problem, though it shouldn't

When I run $ echo -e ${PATH//:/\\n} the first lines are: /home/gturon/anaconda3/condabin /home/gturon/.vscode-server/bin/5235c6bb189b60b01b1f49062f4ffa42384f8c91/bin/remote-cli /usr/local/sbin /usr/local/bin

When fetching the model eos3ae7, I get the following error:

_Detailed error: Model API eos3ae7:predict did not produce an outputTraceback (most recent call last): File "/home/gturon/eos/repository/eos3ae7/202212122249555D39E0/eos3ae7/artifacts/framework/code/main.py", line 7, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

So, pandas is not found, but when I do: $ conda activate eos3ae7 $ conda list I find pandas installed (version 1.3.5. ) The package is imported without problems, so it IS in the environment. For eos4tcc, is basically the same but the module not found is joblib (which again, IS in the conda environment, version 1.1.0) This is suspiciously similar to the issue we were encountering in Google Colab when the pythonpath was not properly set, as @carcablop identified.

Cee-tech21 commented 1 year ago

Hello everyone! Great job!!! Quick update here!!

I tried again to fetch model eos3ae7 using google colab but I'm getting the error message below after the fetch code executes for around 10 minutes:

Detailed error: Model API eos3ae7:predict did not produce an outputTraceback (most recent call last): File "/root/eos/repository/eos3ae7/20221215160804_62CE4B/eos3ae7/artifacts/framework/code/main.py", line 7, in import pandas as pd ModuleNotFoundError: No module named 'pandas'

pandas related error message should not be showing as pandas was successfully imported and successfully called before issuing the fetch command. Have a look at colab link...

https://colab.research.google.com/drive/1I4pmrDjXS_XXwRRWyTSI-Kf5m76SXPR9?usp=sharing

GemmaTuron commented 1 year ago

I've been checking if the latest updates on the pythonpaths https://github.com/ersilia-os/ersilia/commit/70bcf5469d912b86b469a3db9e2978f34ff7a1fe would solve this issue but it seems we still lack some packages, in this latest test (in colab): "yaml"

GemmaTuron commented 1 year ago

And the latest updates we did to the pythonpaths seem to be breaking the code somewhere else on the CLI (see log file attached) eos3ae7.txt

samuelmaina commented 1 year ago

Run the model in WSL2 (using Ubuntu 20.04.5) and I get the same error of package not found but in this case it is "yaml". I have confirmed 'yaml' is not installed in the eos3ae7 env but pandas is . Tried to to install it manually but the model didn't work.

Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/main.py", line 10, in <module>
    from chemvae.vae_utils import VAEUtils
  File "/home/samuelmayna/eos/repository/eos3ae7/20230328090007_475C16/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 4, in <module>
    import yaml
ModuleNotFoundError: No module named 'yaml'

More Error logs can be found at eos3ae7_fetch.log

GemmaTuron commented 1 year ago

Hi @samuelmaina

If you clone the repository to your local system, and modify their installation requirements to add the yaml package, does it work? You then need to call the model using the --repo_path <path_to_cloned_repo> flag at the end of the fetch command

samuelmaina commented 1 year ago

@GemmaTuron Pandas is not detected in the remote repo. Added pandas and pyyaml(also tried with PyYALM) to the Dockerfile so that they are installed. dockerfile_change I got pandas not installed error. pandas_error

Pandas was not in the eos3ae7 env but ruamel-yaml was. Looked at script.sh generated to run the installation command from the line Running bash /tmp/ersilia-1k_bwc4b/script.sh > /tmp/ersilia-_wtlkhjr/command_outputs.log 2>&1. After running all the installation commands the script was downloading code from https://github.com/ersilia-os/bentoml-ersilia. I looked at the setup.py setup.py and found that Yaml that is in "required include" is the "ruamel.yaml" which is incompatible with import yaml, it is used as

    from ruamel.yaml import YAML

    yaml=YAML(typ='safe')   # default, if not specfied, is 'rt' (round-trip)
    yaml.load(doc)

as seen from here. The required yaml is pyyaml .My guess is that there is some automated workflows that are uninstalling pandas .

samuelmaina commented 1 year ago

I have tested the model with one conda-forge(I had two in the dockerfile in the previous comment) and the results are the same.

GemmaTuron commented 1 year ago

Hi @samuelmaina !

Thanks, that is a very good catch! I'll need to see why are we using ruamel.yaml in bentoml --- maybe it will be easier to change the pyyaml to ruamel.yaml in the model itself, since the bento-ml package is used by all ersilia models ? What do you think? I need some time to think about it, but your work has been great to point us in the right direction, many thanks

samuelmaina commented 1 year ago

I am really grateful,. I think its a good idea to install pyyalm for the local model, no need to break the others. Migrating to pyyaml would be hectic but you can consult.

samuelmaina commented 1 year ago

Hi everyone! @GemmaTuron I tried authors' recommended versions .I tried tensorflow=1.10.0, keras ('Keras>=2.0.0,<=2.0.7') together with pyyaml and pandas. tensorflow=1.10.0 couldn't be found but the second command in git_hub_issue installed it . The 1.1.0.0 version was having a numpy version range values below other modules numpys resulting in an installation error. I tried version 1.15.0 but got this error

Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos3ae7:predict did not produce an outputUsing TensorFlow backend.
From /home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:439: The name tf.placeholder is deprecated. Please use tf.compat.v1.placeholder instead.

From /home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py:3540: The name tf.random_uniform is deprecated. Please use tf.random.uniform instead.

Traceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/main.py", line 16, in <module>
    vae = VAEUtils()
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 43, in __init__
    self.enc = load_encoder(self.params)
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406105955_71665A/eos3ae7/artifacts/framework/code/chemvae/models.py", line 79, in load_encoder
    return load_model(params['encoder_weights_file'])
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/models.py", line 239, in load_model
    model = model_from_config(model_config, custom_objects=custom_objects)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/models.py", line 313, in model_from_config
    return layer_module.deserialize(config, custom_objects=custom_objects)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/layers/__init__.py", line 54, in deserialize
    printable_module_name='layer')
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/utils/generic_utils.py", line 139, in deserialize_keras_object
    list(custom_objects.items())))
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 2497, in from_config
    process_node(layer, node_data)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 2454, in process_node
    layer(input_tensors[0], **kwargs)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 575, in __call__
    self.build(input_shapes[0])
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/layers/convolutional.py", line 134, in build
    constraint=self.kernel_constraint)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/legacy/interfaces.py", line 87, in wrapper
    return func(*args, **kwargs)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/engine/topology.py", line 399, in add_weight
    constraint=constraint)
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend/tensorflow_backend.py", line 323, in variable
    v.constraint = constraint
AttributeError: can't set attribute

Pandas and yaml were imported and used correctly since they are imported before VAEUtils() is called.

. After some research, the error was emerging from using 'Keras 2.0.7" . I then upgraded to the latest version 2.12.0 but it requires python 3.8. I used conda install keras to install the best version. but I got this error.

12:14:45 | DEBUG    | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}]
12:14:56 | ERROR    | Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos3ae7:predict did not produce an outputTraceback (most recent call last):
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/main.py", line 10, in <module>
    from chemvae.vae_utils import VAEUtils
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/chemvae/vae_utils.py", line 5, in <module>
    from .models import load_encoder, load_decoder, load_property_predictor
  File "/home/samuelmayna/eos/repository/eos3ae7/20230406121243_219A53/eos3ae7/artifacts/framework/code/chemvae/models.py", line 1, in <module>
    from keras.layers import Input, Lambda
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/__init__.py", line 20, in <module>
    from keras import distribute
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/distribute/__init__.py", line 18, in <module>
    from keras.distribute import sidecar_evaluator
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/distribute/sidecar_evaluator.py", line 22, in <module>
    from keras.optimizers.optimizer_experimental import (
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/optimizers/__init__.py", line 25, in <module>
    from keras import backend
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend.py", line 32, in <module>
    from keras import backend_config
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/keras/backend_config.py", line 33, in <module>
    @tf.__internal__.dispatch.add_dispatch_support
  File "/home/samuelmayna/miniconda3/envs/eos3ae7/lib/python3.7/site-packages/tensorflow_core/python/util/module_wrapper.py", line 193, in __getattr__
    attr = getattr(self._tfmw_wrapped_module, name)
AttributeError: module 'tensorflow._api.v1.compat.v2' has no attribute '__internal__'

Researched a bit and found a solution at stackoverflow. After setting keras=2.1.6,I got a lot of inner dependencies conflict errors as can been seen from keras_2_1_6_error.txt and both pandas and yaml were not installed due to conflicts. I looked at the original repo and users are requesting for the exact dependencies as can be seen from chemical_vae_issue. Someone can come up with with the working versions for the model but I think it will take a lot of time. I hope this research will shed some more light. Py YAML is used at the higher level of this model. bentoml-ersilia setup.py is fine.

miquelduranfrigola commented 1 year ago

Hi @GemmaTuron and @samuelmaina - this is here on hold. What is the current status?

samuelmaina commented 1 year ago

@miquelduranfrigola last time I tested it was not working due to dependencies issues.If @GemmaTuron isn't done with it , I can try to resolve the issue again.

GemmaTuron commented 1 year ago

Hi @miquelduranfrigola and @samuelmaina Let's focus on this model once we get to it? (family of generative models)

GemmaTuron commented 1 year ago

This has been solved now! check the repo on this model for more :)