Closed GemmaTuron closed 1 year ago
Hi @GemmaTuron I'm currently testing on the CLI. For testing on colab, I intend to use this colab notebook.
Hi @GemmaTuron I'm currently testing on the CLI. For testing on colab, I intend to use this colab notebook.
Tested on CLI and Colab. Worked on neither of them. However, talking to Miquel we discussed that it was because of some libraries I had likely already installed on my laptop that are interfering with the Conda environment. Therefore, probably just an issue with my local computer. Doesn't work on Colab for me because I think Colab uses my local environment to run the models (just the new colab notebook I think).
Hi @karthikjetty ,
Can you list which libraries you think are causing the issues? Colab does not run locally unless specified, so could you try again? You will see on the top right corner of the Colab Notebook page the runtime which will indicate in which machine are you working
Thanks
Hi @GemmaTuron
[x] I was able to fetch, serve and generate the model eos935d successfully on the CLI . eos935d_fetch.log
output produced from the eml_canonical.csv eos935d_predict.csv
output produced from a single smile eos935d_predict_single.csv
[ ] Model failed to fetch on [colab]
eos935d colab
Ersilia exception class: EmptyOutputError
On reviewing the error, I noticed there was an error in the run_predict.sh of the colab notebook.
Model API eos935d:predict did not produce an output/root/eos/repository/eos935d/20230115045908_F14D7D/eos935d/artifacts/framework/run_predict.sh: line 1: /usreos935d/bin/python: No such file or directory
This error is similar to the error returned in the eos4q1a colab notebook.
Hi @pauline-banye. Thank you for this. The error is related to the environment variable (python path), as you can see in the error output, that path does not exist (/usreos935d/bin/python
). This is something that I must correct and review very well because now it is failing, since the environment variable was being assigned correctly before.
Hi @pauline-banye and @karthikjetty
The issue reported above is solved, so please go on with testing the model, thanks!
Hi @GemmaTuron, the model works perfectly.
I retested it with the updated colab notebook https://colab.research.google.com/drive/1K4-9u9wO3o0MIota4wv2i7gICNu4bqpe?usp=sharing
Hi @GemmaTuron !
I tested the model on my CLI, and the model failed to fetch. Attaching the log files here. eos935d.log
Testing on colab notebook:
It works perfectly on colab.
https://colab.research.google.com/drive/1UR3y-AnT8XhTwJPyFnJM5bUJ8j8ESs-J#scrollTo=CHFQjKJ2cuMD
Hi @Femme-js
Given that the model is working in Colab, this is a good opportunity to understand what is happening in your system. I've identified the source of the error, but I'll let you have a look first
Hi @Femme-js ,
Could you confirm now that you solved space issues that the model is working?
The model doesn't work on my CLI. I've had this issue with other models before and the primary reason is because the conda environment is interacting with my other conda environments. Not sure how to fix this other than erasing all my past conda environments (or using Dockers). eos935d predict error.log
On Colab, model runs fine (though takes a while to load). Here is the link.
https://colab.research.google.com/drive/16vvT-utL9z9c19hCZhbJeKngCQcuW6sx#scrollTo=ipckLYxPS3GY
The model doesn't work on my CLI. I've had this issue with other models before and the primary reason is because the conda environment is interacting with my other conda environments. Not sure how to fix this other than erasing all my past conda environments (or using Dockers). eos935d predict error.log
On Colab, model runs fine (though takes a while to load). Here is the link.
https://colab.research.google.com/drive/16vvT-utL9z9c19hCZhbJeKngCQcuW6sx#scrollTo=ipckLYxPS3GY
Hello @karthikjetty This error is specific to the model, since I implemented a function to obtain the python path of the model, and this is what is causing the problem at the moment ( I have been able to see it again in the logs that you have shared.). Since the functionality has been implemented within Ersilia, this should no longer be a problem for me and I can now remove it from the code in the model. Thank you for this. I will work on this change today and upload it.
Hi @carcablop and @karthikjetty :
The error seems an issue with Karthik's installation, nothing due to the model. Are you sure you are using the latest ersilia version? (please pull the repo and start anew) @carcablop I dont understand what have you changed in the model that is making ersilia crash in the tests in the PR, is the change necessary?
Hello @GemmaTuron. Honestly, the change is not necessary, the model works fine. Initially, I was confused by the error shared by @karthikjetty, but if the conda environment isn't set up right, that's a common error. I wanted to try to make a change, but it didn't work since the PR did not pass the test, with that I realized that the change was not correct. Sorry for my confusion.
I agree thanks @carcablop !
@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.
Hi @GemmaTuron !
The model is producing same error on my CLI as in issue #1. eos935d.log
I would be trying and testing to see if the model works accurately on my CLI.
@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.
There are lots of libraries that might be causing the error. There were two things that caught my eye in the log file.
bentoml 0.11.0 requires sqlalchemy<1.4.0,>=1.3.0, but you have sqlalchemy 1.4.42 which is incompatible. bentoml 0.11.0 requires urllib3<=1.25.11, but you have urllib3 1.26.14 which is incompatible.
sentry-sdk 1.14.0 requires urllib3>=1.26.11; python_version >= "3.6", but you have urllib3 1.25.11 which is incompatible. Successfully installed sqlalchemy-1.3.24 urllib3-1.25.11
These are two problems that pop up in my log. It looks like these two problems are contradictory. After the bentoml issue, the sqalchemy and urllib libraries are uninstalled and the other versions are used. Later, it says that sentry-sdk needs the same libraries, but a different version.
I think the issue stems from me having downloaded sentry-sdk or bentoML downloaded in a previous conda environment, which is causing possibly outdated versions of bentoML or sentry-sdk to be used. This is potentially why I have the error on my system but other people don't have it on theirs, since the requirements for libraries for newer versions of bentoML or sentry-sdk might be compatible.
I could potentially try fixing this issue, but it might require me to remove a lot of conda environments from my laptop. Instead, should I try further examining which of the specific libraries (out of the 4 listed in the errors) are causing the issue?
Hi @GemmaTuron !
The model is producing same error on my CLI as in issue #1. eos935d.log
I would be trying and testing to see if the model works accurately on my CLI.
Hi @Femme-js
Again the issue is the space left on your disk is not suficient:
ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device
Why are you pointing to issue #1? I don't see the link with the error you are getting, please point em to the right lines of the log file
@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.
There are lots of libraries that might be causing the error. There were two things that caught my eye in the log file.
bentoml 0.11.0 requires sqlalchemy<1.4.0,>=1.3.0, but you have sqlalchemy 1.4.42 which is incompatible. bentoml 0.11.0 requires urllib3<=1.25.11, but you have urllib3 1.26.14 which is incompatible.
sentry-sdk 1.14.0 requires urllib3>=1.26.11; python_version >= "3.6", but you have urllib3 1.25.11 which is incompatible. Successfully installed sqlalchemy-1.3.24 urllib3-1.25.11
These are two problems that pop up in my log. It looks like these two problems are contradictory. After the bentoml issue, the sqalchemy and urllib libraries are uninstalled and the other versions are used. Later, it says that sentry-sdk needs the same libraries, but a different version.
I think the issue stems from me having downloaded sentry-sdk or bentoML downloaded in a previous conda environment, which is causing possibly outdated versions of bentoML or sentry-sdk to be used. This is potentially why I have the error on my system but other people don't have it on theirs, since the requirements for libraries for newer versions of bentoML or sentry-sdk might be compatible.
I could potentially try fixing this issue, but it might require me to remove a lot of conda environments from my laptop. Instead, should I try further examining which of the specific libraries (out of the 4 listed in the errors) are causing the issue?
@karthikjetty
Please do try to clean up your system, because you will find errors on another models while working so we should try to have this set up properly.
The errors you mention do not seem the source of the problem, rather the presence of old metadata files (see lines 64 to 105). I'd suggest removing unused installs and cleaning it up.
Also, I can't understand why are you appending the log error of another model here; did you try eos935d ?
13:31:54 | INFO | Removing bento folder first /Users/karthik/bentoml/repository/eos2r5a/20230110132613_261874
@pauline-banye and @carcablop
May I ask you to update ersilia (if you haven't) - to the latest version and reinstall it (it has a slimmed bentoML version) and try the model again? I am getting some errors with the new bentoML install and want to make sure if its a general thing or not. log_error.txt
Hi @GemmaTuron Of course, even though I updated it last week. I'll update it again.
@carcablop
I've identified the problem. You might have, as I did, old eos-bentoml-0.11.0...
conda environments. If you delete these and fetch again, the issue is solved
@karthikjetty and @Femme-js
I am awaiting confirmation that you were able to troubleshoot your issues ( @karthikjetty I havent seen the right log files for this model yet, and @Femme-js it seemed like a disk space issue)
Yes @GemmaTuron ! Seems same to me too. I am cleaning up the disk and everything. Would be testing it again.
Hi @GemmaTuron, I tested on the CLI but I received errors, retesting again. It worked without issues on colab. I would provide a more detailed update once I conclude testing with the CLI.
@pauline-banye thanks, please do check what I mentioned to carolina in the above messages before testing again
@carcablop I've identified the problem. You might have, as I did, old
eos-bentoml-0.11.0...
conda environments. If you delete these and fetch again, the issue is solved
Hello @GemmaTuron I made the suggested changes, besides cleaning my entire ubuntu system, removing the conda environments I wasn't using and removing the previously tested models, and finally removing the forks of the already built-in models. The model eos935d fetch successfully on CLI. log_fetch_eos935d.txt
Thank you :).
Hi @GemmaTuron , I pulled the latest changes from Ersilia , deleted the bentoml environments and tested the model again.
thanks @pauline-banye and @carcablop !
I'll mark this as completed!
Hi Gemma. I removed all my other conda environments, but still received the same error as before.
I notice that I have the environments listed in my conda environments that I did not remove.
eosbase-bentoml-0.11.0-py37 /opt/miniconda3/envs/eosbase-bentoml-0.11.0-py37 eosbase-bentoml-0.11.0-py38 /opt/miniconda3/envs/eosbase-bentoml-0.11.0-py38
I tried removing these using conda remove -n eosbase-bentoml-0.11.0-py37, but it gave me errors (I think I have to remove them as packages...?). I will try troubleshooting some more.
Hello @karthikjetty Karthik,
The environments need to be removed as conda environments, which errors did it give you? Let's move this conversation to the internship channel in slack since this issue is closed and the model is working, this is definitely an issue in your system in particular. Please post there your issues with these environments.
Hello @GemmaTuron ,
I did the envs cleaning as you mentioned and reinstalled ersilia too in my system with creating a new conda environment. I still got the same error with bentoml issue. I manually updated the bentoml version to 1.0.13, but it produces the following error now. eos935d.log
Test the model using a single smiles and a .csv file with a few of them to check that it works.