ersilia-os / eos935d

GNU General Public License v3.0
0 stars 0 forks source link

Test model eos935d in CLI and Colab #2

Closed GemmaTuron closed 1 year ago

GemmaTuron commented 1 year ago

Test the model using a single smiles and a .csv file with a few of them to check that it works.

paulinebanye commented 1 year ago

Hi @GemmaTuron I'm currently testing on the CLI. For testing on colab, I intend to use this colab notebook.

paulinebanye commented 1 year ago

Hi @GemmaTuron I'm currently testing on the CLI. For testing on colab, I intend to use this colab notebook.

karthikjetty commented 1 year ago

Tested on CLI and Colab. Worked on neither of them. However, talking to Miquel we discussed that it was because of some libraries I had likely already installed on my laptop that are interfering with the Conda environment. Therefore, probably just an issue with my local computer. Doesn't work on Colab for me because I think Colab uses my local environment to run the models (just the new colab notebook I think).

GemmaTuron commented 1 year ago

Hi @karthikjetty ,

Can you list which libraries you think are causing the issues? Colab does not run locally unless specified, so could you try again? You will see on the top right corner of the Colab Notebook page the runtime which will indicate in which machine are you working

Thanks

paulinebanye commented 1 year ago

Hi @GemmaTuron

Testing the model on the CLI and Colab.

On reviewing the error, I noticed there was an error in the run_predict.sh of the colab notebook. Model API eos935d:predict did not produce an output/root/eos/repository/eos935d/20230115045908_F14D7D/eos935d/artifacts/framework/run_predict.sh: line 1: /usreos935d/bin/python: No such file or directory This error is similar to the error returned in the eos4q1a colab notebook.

carcablop commented 1 year ago

Hi @pauline-banye. Thank you for this. The error is related to the environment variable (python path), as you can see in the error output, that path does not exist (/usreos935d/bin/python). This is something that I must correct and review very well because now it is failing, since the environment variable was being assigned correctly before.

GemmaTuron commented 1 year ago

Hi @pauline-banye and @karthikjetty

The issue reported above is solved, so please go on with testing the model, thanks!

paulinebanye commented 1 year ago

Hi @GemmaTuron, the model works perfectly.

I retested it with the updated colab notebook https://colab.research.google.com/drive/1K4-9u9wO3o0MIota4wv2i7gICNu4bqpe?usp=sharing

Femme-js commented 1 year ago

Hi @GemmaTuron !

I tested the model on my CLI, and the model failed to fetch. Attaching the log files here. eos935d.log

Testing on colab notebook:

It works perfectly on colab.

https://colab.research.google.com/drive/1UR3y-AnT8XhTwJPyFnJM5bUJ8j8ESs-J#scrollTo=CHFQjKJ2cuMD

GemmaTuron commented 1 year ago

Hi @Femme-js

Given that the model is working in Colab, this is a good opportunity to understand what is happening in your system. I've identified the source of the error, but I'll let you have a look first

GemmaTuron commented 1 year ago

Hi @Femme-js ,

Could you confirm now that you solved space issues that the model is working?

karthikjetty commented 1 year ago

The model doesn't work on my CLI. I've had this issue with other models before and the primary reason is because the conda environment is interacting with my other conda environments. Not sure how to fix this other than erasing all my past conda environments (or using Dockers). eos935d predict error.log

On Colab, model runs fine (though takes a while to load). Here is the link.

https://colab.research.google.com/drive/16vvT-utL9z9c19hCZhbJeKngCQcuW6sx#scrollTo=ipckLYxPS3GY

carcablop commented 1 year ago

The model doesn't work on my CLI. I've had this issue with other models before and the primary reason is because the conda environment is interacting with my other conda environments. Not sure how to fix this other than erasing all my past conda environments (or using Dockers). eos935d predict error.log

On Colab, model runs fine (though takes a while to load). Here is the link.

https://colab.research.google.com/drive/16vvT-utL9z9c19hCZhbJeKngCQcuW6sx#scrollTo=ipckLYxPS3GY

Hello @karthikjetty This error is specific to the model, since I implemented a function to obtain the python path of the model, and this is what is causing the problem at the moment ( I have been able to see it again in the logs that you have shared.). Since the functionality has been implemented within Ersilia, this should no longer be a problem for me and I can now remove it from the code in the model. Thank you for this. I will work on this change today and upload it.

GemmaTuron commented 1 year ago

Hi @carcablop and @karthikjetty :

The error seems an issue with Karthik's installation, nothing due to the model. Are you sure you are using the latest ersilia version? (please pull the repo and start anew) @carcablop I dont understand what have you changed in the model that is making ersilia crash in the tests in the PR, is the change necessary?

carcablop commented 1 year ago

Hello @GemmaTuron. Honestly, the change is not necessary, the model works fine. Initially, I was confused by the error shared by @karthikjetty, but if the conda environment isn't set up right, that's a common error. I wanted to try to make a change, but it didn't work since the PR did not pass the test, with that I realized that the change was not correct. Sorry for my confusion.

GemmaTuron commented 1 year ago

I agree thanks @carcablop !

@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.

Femme-js commented 1 year ago

Hi @GemmaTuron !

The model is producing same error on my CLI as in issue #1. eos935d.log

I would be trying and testing to see if the model works accurately on my CLI.

karthikjetty commented 1 year ago

@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.

There are lots of libraries that might be causing the error. There were two things that caught my eye in the log file.

bentoml 0.11.0 requires sqlalchemy<1.4.0,>=1.3.0, but you have sqlalchemy 1.4.42 which is incompatible. bentoml 0.11.0 requires urllib3<=1.25.11, but you have urllib3 1.26.14 which is incompatible.

sentry-sdk 1.14.0 requires urllib3>=1.26.11; python_version >= "3.6", but you have urllib3 1.25.11 which is incompatible. Successfully installed sqlalchemy-1.3.24 urllib3-1.25.11

These are two problems that pop up in my log. It looks like these two problems are contradictory. After the bentoml issue, the sqalchemy and urllib libraries are uninstalled and the other versions are used. Later, it says that sentry-sdk needs the same libraries, but a different version.

I think the issue stems from me having downloaded sentry-sdk or bentoML downloaded in a previous conda environment, which is causing possibly outdated versions of bentoML or sentry-sdk to be used. This is potentially why I have the error on my system but other people don't have it on theirs, since the requirements for libraries for newer versions of bentoML or sentry-sdk might be compatible.

I could potentially try fixing this issue, but it might require me to remove a lot of conda environments from my laptop. Instead, should I try further examining which of the specific libraries (out of the 4 listed in the errors) are causing the issue?

GemmaTuron commented 1 year ago

Hi @GemmaTuron !

The model is producing same error on my CLI as in issue #1. eos935d.log

I would be trying and testing to see if the model works accurately on my CLI.

Hi @Femme-js

Again the issue is the space left on your disk is not suficient: ERROR: Could not install packages due to an OSError: [Errno 28] No space left on device Why are you pointing to issue #1? I don't see the link with the error you are getting, please point em to the right lines of the log file

GemmaTuron commented 1 year ago

@karthikjetty please do make sure to troubleshoot these issues in your system, have a look and let us know what you find out about where the error might be.

There are lots of libraries that might be causing the error. There were two things that caught my eye in the log file.

bentoml 0.11.0 requires sqlalchemy<1.4.0,>=1.3.0, but you have sqlalchemy 1.4.42 which is incompatible. bentoml 0.11.0 requires urllib3<=1.25.11, but you have urllib3 1.26.14 which is incompatible.

sentry-sdk 1.14.0 requires urllib3>=1.26.11; python_version >= "3.6", but you have urllib3 1.25.11 which is incompatible. Successfully installed sqlalchemy-1.3.24 urllib3-1.25.11

These are two problems that pop up in my log. It looks like these two problems are contradictory. After the bentoml issue, the sqalchemy and urllib libraries are uninstalled and the other versions are used. Later, it says that sentry-sdk needs the same libraries, but a different version.

I think the issue stems from me having downloaded sentry-sdk or bentoML downloaded in a previous conda environment, which is causing possibly outdated versions of bentoML or sentry-sdk to be used. This is potentially why I have the error on my system but other people don't have it on theirs, since the requirements for libraries for newer versions of bentoML or sentry-sdk might be compatible.

I could potentially try fixing this issue, but it might require me to remove a lot of conda environments from my laptop. Instead, should I try further examining which of the specific libraries (out of the 4 listed in the errors) are causing the issue?

@karthikjetty Please do try to clean up your system, because you will find errors on another models while working so we should try to have this set up properly. The errors you mention do not seem the source of the problem, rather the presence of old metadata files (see lines 64 to 105). I'd suggest removing unused installs and cleaning it up. Also, I can't understand why are you appending the log error of another model here; did you try eos935d ? 13:31:54 | INFO | Removing bento folder first /Users/karthik/bentoml/repository/eos2r5a/20230110132613_261874

GemmaTuron commented 1 year ago

@pauline-banye and @carcablop

May I ask you to update ersilia (if you haven't) - to the latest version and reinstall it (it has a slimmed bentoML version) and try the model again? I am getting some errors with the new bentoML install and want to make sure if its a general thing or not. log_error.txt

carcablop commented 1 year ago

Hi @GemmaTuron Of course, even though I updated it last week. I'll update it again.

carcablop commented 1 year ago

Hi @GemmaTuron

When I fetch the model, I get the same error about bentoML

eos935d_log.txt

GemmaTuron commented 1 year ago

@carcablop I've identified the problem. You might have, as I did, old eos-bentoml-0.11.0... conda environments. If you delete these and fetch again, the issue is solved

GemmaTuron commented 1 year ago

@karthikjetty and @Femme-js

I am awaiting confirmation that you were able to troubleshoot your issues ( @karthikjetty I havent seen the right log files for this model yet, and @Femme-js it seemed like a disk space issue)

Femme-js commented 1 year ago

Yes @GemmaTuron ! Seems same to me too. I am cleaning up the disk and everything. Would be testing it again.

paulinebanye commented 1 year ago

Hi @GemmaTuron, I tested on the CLI but I received errors, retesting again. It worked without issues on colab. I would provide a more detailed update once I conclude testing with the CLI.

GemmaTuron commented 1 year ago

@pauline-banye thanks, please do check what I mentioned to carolina in the above messages before testing again

carcablop commented 1 year ago

@carcablop I've identified the problem. You might have, as I did, old eos-bentoml-0.11.0... conda environments. If you delete these and fetch again, the issue is solved

Hello @GemmaTuron I made the suggested changes, besides cleaning my entire ubuntu system, removing the conda environments I wasn't using and removing the previously tested models, and finally removing the forks of the already built-in models. The model eos935d fetch successfully on CLI. log_fetch_eos935d.txt

Thank you :).

paulinebanye commented 1 year ago

Hi @GemmaTuron , I pulled the latest changes from Ersilia , deleted the bentoml environments and tested the model again.

MODEL TEST FOR EOS935D

CLI

COLAB

GemmaTuron commented 1 year ago

thanks @pauline-banye and @carcablop !

I'll mark this as completed!

karthikjetty commented 1 year ago

screen.log

Hi Gemma. I removed all my other conda environments, but still received the same error as before.

I notice that I have the environments listed in my conda environments that I did not remove.

eosbase-bentoml-0.11.0-py37 /opt/miniconda3/envs/eosbase-bentoml-0.11.0-py37 eosbase-bentoml-0.11.0-py38 /opt/miniconda3/envs/eosbase-bentoml-0.11.0-py38

I tried removing these using conda remove -n eosbase-bentoml-0.11.0-py37, but it gave me errors (I think I have to remove them as packages...?). I will try troubleshooting some more.

GemmaTuron commented 1 year ago

Hello @karthikjetty Karthik,

The environments need to be removed as conda environments, which errors did it give you? Let's move this conversation to the internship channel in slack since this issue is closed and the model is working, this is definitely an issue in your system in particular. Please post there your issues with these environments.

Femme-js commented 1 year ago

Hello @GemmaTuron ,

I did the envs cleaning as you mentioned and reinstalled ersilia too in my system with creating a new conda environment. I still got the same error with bentoml issue. I manually updated the bentoml version to 1.0.13, but it produces the following error now. eos935d.log