Closed github-actions[bot] closed 1 year ago
Hello @GemmaTuron and @samuelmaina, I tested this model using Colab, and it works well.eos7pw8_output.csv
I haven't completed testing with ersilia CLI ad docker. The CLI fetches from the docker hub, which image is still used to test with docker-desktop. But, my internet bundle and network bandwidth are not the best and yet images are quite big.
Hi @HellenNamulinda !
Thanks, if the model is taking too long to fetch, let's wait for @pittmanriley feedback on it and we can move on
Hi @GemmaTuron,
I'm seeing that I'm unable to download models that are only available in AMD64. My computer needs them to be available in AMD64 and ARM64, such as model eos3b5e (I'm able to fetch this one). I'm working on this issue with @miquelduranfrigola in the meantime.
No worries @pittmanriley I'll leave that on hold
Hi @GemmaTuron and @samuelmaina, The model works well using Colab, eos7pw8_colab_output.csv
On CLI and Docker, despite the model fetching successfully, eos7pw8_fetch.log, it gives null outputs.
CLI;
(ersilia) hellenah@hellenah-elitebook:~$ ersilia -v fetch eos7pw8 > eos7pw8_fetch.log 2>&1
(ersilia) hellenah@hellenah-elitebook:~$ ersilia serve eos7pw8
π Serving model eos7pw8: syba-synthetic-accessibility
URL: http://0.0.0.0:57975
PID: -1
SRV: pulled_docker
π To run model:
π Information:
info (ersilia) hellenah@hellenah-elitebook:~$ ersilia run -i "['CCCOCCC', 'CC(=O)O']" { "input": { "key": "POLCUAVZOMRGSN-UHFFFAOYSA-N", "input": "CCCOCCC", "text": "CCCOCCC" }, "output": { "outcome": [ null ] } } { "input": { "key": "QTBSBXVTEAMEQO-UHFFFAOYSA-N", "input": "CC(=O)O", "text": "CC(=O)O" }, "output": { "outcome": [ null ] } }
- Docker
hellenah@hellenah-elitebook:~$ docker run -v /home/hellenah:/data ersiliaos/eos7pw8
URL: http://127.0.0.1:3000 PID: 36 SRV: conda
π To run model:
π Information:
hellenah@hellenah-elitebook:~$ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a5b68acbf23c ersiliaos/eos7pw8 "sh /root/docker-entβ¦" 5 minutes ago Up 3 minutes 80/tcp musing_blackwell hellenah@hellenah-elitebook:~$ docker exec -it a5b68acbf23c /bin/bash root@a5b68acbf23c:~# ersilia run -i "CC(=O)NC1=CC=C(O)C=C1" { "input": { "key": "RZVAJINKPMORJF-UHFFFAOYSA-N", "input": "CC(=O)NC1=CC=C(O)C=C1", "text": "CC(=O)NC1=CC=C(O)C=C1" }, "output": { "outcome": [ null ] } } root@a5b68acbf23c:~# ersilia run -i "CC(=O)O" { "input": { "key": "QTBSBXVTEAMEQO-UHFFFAOYSA-N", "input": "CC(=O)O", "text": "CC(=O)O" }, "output": { "outcome": [ null ] } }
I will try cloning and fetching this repo locally and give feedback
@emmakodes can you test this model as well so we can close the issue if it works? @pittmanriley is still setting up his system Thanks
@GemmaTuron @emmakodes
After getting help from Miquel, I'm able to test this model now. I am getting the same results as @HellenNamulinda. It seems that when I test it on the CLI, the outputs are always null, but I get real outputs when running with Colab. Here is the CLI output:
ersilia run -i compound_list.csv
{
"input": {
"key": "LUHMMHZLDLBAKX-UHFFFAOYSA-N",
"input": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O",
"text": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O"
},
"output": null
}
{
"input": {
"key": "QRXWMOHMRWLFEY-UHFFFAOYSA-N",
"input": "C1=CN=CC=C1C(=O)NN",
"text": "C1=CN=CC=C1C(=O)NN"
},
"output": null
}
{
"input": {
"key": "SGOIRFVFHAKUTI-UHFFFAOYSA-N",
"input": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O",
"text": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O"
},
"output": null
}
When I run it in Google Colab, I seem to get real outputs that are never null. Here is the file produced from Google Colab using the file eml_canonical.csv. eos7pw8_output.csv
Hello @samuelmaina @GemmaTuron
When I test the model on CLI on my laptop, the model takes long so I decided to set up Ersilia on google colab to access more RAM and Disk. The model fetch successfully and makes prediction successfully eos7pw8_cli_pred.csv eos7pw8_cli_log.txt Maybe because I am using google colab that has more RAM and Disk space that is why I am not getting null output for predictions.
Model fetch and makes prediction successfully eos7pw8_output.csv
Model takes too long on my laptop
Hi @emmakodes, @pittmanriley and @HellenNamulinda This model is actually very small, since it uses an already prepared python package (syba) and it does not contain the checkpoints inside, which is the large part of a repo. How big is the image that does not let you test it?
When you fetch a model through the CLI, it does test a few molecules that cannot return a null value so I find it surprising that later the molecules inputed are returned null, if the model hasn't failed at fetch time. In my case, it works fine on the CLI: input file: test.csv output file: out.csv
can you please check with the same file and see if you are also getting the same output?
A bit of extra information about this model - Syba is actually "retraining" every time we servei it, which is not ideal. It will be nice to have an already trained version of Syba. @GemmaTuron is dealing with this. She will keep you posted.
Hi @emmakodes @pittmanriley and @HellenNamulinda
These lines of code in main.py
were retraining the model each time, which can be done but is time consuming and perhaps the reason you are getting null outputs.
from syba.syba import SybaClassifier
syba = SybaClassifier()
syba.fitDefaultScore()
In my latest commit you will see I have saved the trained syba model as a .joblib file and I am calling it simply. Hopefully this makes it faster and avoids null results, could you please check the running actions with the updated code?
Hi @GemmaTuron
I'm not sure what you meant by checking the running actions, but I went ahead and fetched the model and retested it with the compound_list.csv file as I did before. I'm still getting null outputs, however. Is this what you meant by check the running actions?
@GemmaTuron and @pittmanriley, The Upload model to DockerHub job failed, so fetching still pulls the image without the current changes.
@samuelmaina , I cloned the repo locally, but the syba.joblib file cannot be found. I guess the path needs to be corrected.eos7pw8_repo_fetch.log
FileNotFoundError: [Errno 2] No such file or directory: '/home/hellenah/eos/repository/eos7pw8/20230628081702_306EEE/eos7pw8/artifacts/framework/code/../../checkpoints/syba.joblib'
We have this error message: 'CondaEnvironmentService' object has no attribute 'pid'
not sure the cause of this.
Thanks all.
I am unsure what is causing this issue. @emmakodes what version of ersilia are you using? ersilia --version
@GemmaTuron and @pittmanriley, The Upload model to DockerHub job failed, so fetching still pulls the image without the current changes.
@samuelmaina , I cloned the repo locally, but the syba.joblib file cannot be found. I guess the path needs to be corrected.eos7pw8_repo_fetch.log
FileNotFoundError: [Errno 2] No such file or directory: '/home/hellenah/eos/repository/eos7pw8/20230628081702_306EEE/eos7pw8/artifacts/framework/code/../../checkpoints/syba.joblib'
Thanks @HellenNamulinda - let's see what @GemmaTuron . This looks like an easy fix
Hello @HellenNamulinda - thanks for reporting this.
This must be an issue with ersilia. I am pushing the latest version of ersilia to dockerhub: https://github.com/ersilia-os/ersilia/actions/runs/5397644593/jobs/9802541077
As soon as the push is done, I think we can try again with eos7pw8 and eos2gth
Hi,
@pittmanriley by check the running actions I meant keep an eye on them to see if the changes work. As @HellenNamulinda points out, the Docker is not updated (action failed) hence you need to fetch --from_github
@samuelmaina please revise the code I wrote and make sure the path to the joblib file is correct, I did it quite fast so it might be there is a small bug thanks.
@HellenNamulinda and @miquelduranfrigola the latest Docker build (16h ago) for the model eos2gth
seems to have worked? https://github.com/ersilia-os/eos2gth/actions
@GemmaTuron , I have made a PR on the issue. Can you have a look please?
@HellenNamulinda an @emmakodes
The model has been updated, can you check it now works (and is faster, hopefully, since the checkpoints are preloaded?)
Okay @GemmaTuron
Hello @GemmaTuron @samuelmaina
Model fetch successfully on CLI. When I run the code to make predictions using a file first, the model produces an output
eos7pw8_cli_pred.csv (time to make prediction for a file: 1074.99 seconds [this value varies depending on network]
but the second time, the model produces this error message:
Traceback (most recent call last):
File "/usr/local/envs/ersilia/bin/ersilia", line 33, in <module>
sys.exit(load_entry_point('ersilia', 'console_scripts', 'ersilia')())
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/click/core.py", line 1130, in __call__
return self.main(*args, **kwargs)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/click/core.py", line 1055, in main
rv = self.invoke(ctx)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/click/core.py", line 1657, in invoke
return _process_result(sub_ctx.command.invoke(sub_ctx))
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/click/core.py", line 1404, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/click/core.py", line 760, in invoke
return __callback(*args, **kwargs)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/bentoml/cli/click_utils.py", line 138, in wrapper
return func(*args, **kwargs)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/bentoml/cli/click_utils.py", line 115, in wrapper
return_value = func(*args, **kwargs)
File "/usr/local/envs/ersilia/lib/python3.7/site-packages/bentoml/cli/click_utils.py", line 99, in wrapper
return func(*args, **kwargs)
File "/content/ersilia/ersilia/cli/commands/api.py", line 38, in api
api_name=api_name, input=input, output=output, batch_size=batch_size
File "/content/ersilia/ersilia/core/model.py", line 353, in api
api_name=api_name, input=input, output=output, batch_size=batch_size
File "/content/ersilia/ersilia/core/model.py", line 367, in api_task
for r in result:
File "/content/ersilia/ersilia/core/model.py", line 194, in _api_runner_iter
for result in api.post(input=input, output=output, batch_size=batch_size):
File "/content/ersilia/ersilia/serve/api.py", line 330, in post
results, output, model_id=self.model_id, api_name=self.api_name
File "/content/ersilia/ersilia/io/output.py", line 301, in adapt
df = self._to_dataframe(result, model_id)
File "/content/ersilia/ersilia/io/output.py", line 247, in _to_dataframe
output_keys_expanded = self.__expand_output_keys(vals, output_keys)
File "/content/ersilia/ersilia/io/output.py", line 206, in __expand_output_keys
assert len(m) == len(v)
TypeError: object of type 'float' has no len()
Model is functional:
eos7pw8_output.csv
Time taken to make predictions: 858.47 seconds
Hi @emmakodes you mean the same exact command gives different outputs?
@HellenNamulinda can you confirm what output are you getting?
Yes @GemmaTuron the same command works the first time to produce output but when I run the second time again, it produces the above error.
That is, when I run ersilia -v api run -i eml_canonical.csv -o eos7pw8_cli_pred.csv
the first time, it successfully produces an output file but when I run it again ersilia -v api run -i eml_canonical.csv -o eos7pw8_cli_pred.csv
it result to TypeError: object of type 'float' has no len()
.
The model has an Output of float
but I think ersilia might be working with the output as a list or iterable.
Hi @emmakodes
The output type is Single
. This error is quite surprising... @HellenNamulinda please try this model and let us know if you encounter the same issue as Emma!
Hello @GemmaTuron I ran the model on codespace and I get null outputs
{
"input": {
"key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
"input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
"text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
},
"output": {
"outcome": [
null
]
}
}
{
"input": {
"key": "HEFNNWSXXWATRW-UHFFFAOYSA-N",
"input": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O",
"text": "CC(C)CC1=CC=C(C=C1)C(C)C(=O)O"
},
"output": {
"outcome": [
null
]
}
}
Hi @Gemma,
The model still fetches successfully(eos7pw8_fetch.log), but continues to give null outputs, usually with a Status code: 504
, not the Status code: 500
for models that fail to work with batch prediction, but work afterwards.
Output log: eos7pw8_output.log
π Serving model eos7pw8: syba-synthetic-accessibility
URL: http://0.0.0.0:40767
PID: -1
SRV: pulled_docker
π To run model:
π Information:
Hi @emmakodes
Please follow the same steps I indicated Zakia for eos46ev to elucidate what is going on here and report back, thanks!
@samuelmaina
Please have a look since this is the model you refactored - @emmakodes sorry mistakenly pointed you to this model
@emmakodes , I am working on the issue to see if the above issues persists. I will let you know when I get consistent outputs for both files and single smiles so that everyone can test on their machines. Thanks.
HI @GemmaTuron , @HellenNamulinda , @emmakodes I tried the code in codespaces and it was giving null. Codespaces has 8GB (One has to select the 8GB from the list change machine option) which is not enough to hold the model.joblib in memory as np.arrays as such it is terminated by gihub and the execution continues. I have added a try catch to exit in the code . The model was using 8+ GB in colab, hence it need a lot of RAM to run.
I am able run in google colab but not codespaces or in my local computer with the modified code.
I was not able solve for the different errors such as TypeError: object of type 'float' has no len()
raised at https://github.com/ersilia-os/eos7pw8/issues/9#issuecomment-1613199462 but let's see if the modifiication will eliminate other errors.
Testing
I have tested with the first 10 smiles from the eml_canonical dataset. and I was able to get results after different fetches and different serves . first_eml_10_test_output.csv
I have also tested using Hellen's test.csv https://github.com/ersilia-os/eos7pw8/issues/9#issuecomment-1617523632 and I was able to produce consistent results with different fetches and serves.
I run with the test smiles from https://github.com/ersilia-os/eos7pw8/issues/9#issuecomment-1617523632 and was able to predict.
{
"input": {
"key": "MLBNXJTXHVBPEC-UHFFFAOYSA-N",
"input": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1",
"text": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1"
},
"output": {
"outcome": [
83.1755432208731
]
}
}
My suggestion: -Run the model in a machine that has 9+ GB of RAM -Reduce the trained joblib eg. by removing some non-essential params to accommodate smaller -RAM machines. @GemmaTuron , I have raised a PR can you please have a look? Thanks everyone.
@emmakodes and @HellenNamulinda
Please check if the changes work!
@GemmaTuron and @samuelmaina,
The model was working in Colab before and it still works.
{
"input": {
"key": "ATUOYWHBWRKTHZ-UHFFFAOYSA-N",
"input": "CCC",
"text": "CCC"
},
"output": {
"outcome": [
11.292091326780156
]
}
}
{
"input": {
"key": "MLBNXJTXHVBPEC-UHFFFAOYSA-N",
"input": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1",
"text": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1"
},
"output": {
"outcome": [
83.1755432208731
]
}
}
My machine has 12GB RAM and Swap(24 GB) and but still the model fails to work. I closed all other apps except the cli before serving.
07:28:22 | DEBUG | Getting session from /home/hellenah/eos/session.json
π Serving model eos7pw8: syba-synthetic-accessibility
URL: http://0.0.0.0:47281
PID: -1
SRV: pulled_docker
π To run model:
- run
π Information:
- info
(ersilia) hellenah@hellenah-elitebook:~$ ersilia run -i "CCC"
{
"input": {
"key": "ATUOYWHBWRKTHZ-UHFFFAOYSA-N",
"input": "CCC",
"text": "CCC"
},
"output": {
"outcome": [
null
]
}
}
Also, fetching from github fails. Ffrom the log file; eos7pw8_cli_fetch_github.log, the conda install command(conda install -c rdkit -c lich syba
) never succeeds. Probably because I use a more recent conda version and yet its installation requires an older conda version.
05:57:54 | ERROR | Ersilia exception class:
EmptyOutputError
Detailed error:
Model API eos7pw8:run did not produce an outputError occurred while loading the model: No module named 'syba'
Thanks @HellenNamulinda , I have looked at the output logs and the starting error is
# >>>>>>>>>>>>>>>>>>>>>> ERROR REPORT <<<<<<<<<<<<<<<<<<<<<<
Traceback (most recent call last):
File "/home/hellenah/anaconda3/lib/python3.10/site-packages/conda/gateways/repodata/__init__.py", line 187, in conda_http_errors
yield
File "/home/hellenah/anaconda3/lib/python3.10/site-packages/conda/gateways/repodata/__init__.py", line 153, in repodata
response.raise_for_status()
File "/home/hellenah/anaconda3/lib/python3.10/site-packages/requests/models.py", line 1021, in raise_for_status
raise HTTPError(http_error_msg, response=self)
requests.exceptions.HTTPError: 404 Client Error: Not Found for url: https://conda.anaconda.org/rdkit/linux-64/current_repodata.json
During handling of the above exception, another exception occurred:
I got the same error in colab when I updated conda from the one installed, so that may be the problem. Let me see a workaround this error
@HellenNamulinda I think you were playing with conda for another model so that might be the reason?
@GemmaTuron and @samuelmaina
This happened when I fetched the model with the --from_github
flag. The default fetch command uses --from_docker
, but sometimes pulling images takes so long when my network is not so good.
The thing is; Colab notebooks for testing ersilia models use conda 4.12.0
; %env MINICONDA_INSTALLER_SCRIPT=Miniconda3-py37_4.12.0-Linux-x86_64.sh
and I guess it is the same for github actions which is used when building docker docker images. It's rare for Colab to fail to install conda packages, because it uses an older version.
For my case, I don't use miniconda, I installed Anaconda3-2023.03-1-Linux-x86_64.sh;
which has conda version 23.5.0
So, except for docker images, when I fetch from github/locally, it will use my machine installed conda to install the conda packages.
Most of us reported packages that are outdated, (these can still be installed if we downgrade conda) With recent conda versions, installing old packages just crashes(packages from channels that were last updated years ago). That's why @samuelmaina encountered the same error in colab when he updated conda
@HellenNamulinda maybe we can put a command to downgrade the conda env in Dockerfile for this model ?Can you do that on your end and see if you can fetch locally?
Hello @samuelmaina @GemmaTuron
I was not able to fetch the model on my system. There is no definite reason from the log as the cause of it but I guess since I don't have up to 9 GB RAM on my system( I have 4GB RAM) or maybe because of the internet. eos7pw8_fetch_log.txt
The model fetch and makes prediction successfully on Colab eos7pw8_output.csv
This model and eos526j typically require more RAM to work properly. They work on Colab but when you try to run them on CLI without enough RAM then they don't work properly. I guess what we can do is for someone with up to 10GB RAM or more to test these models using CLI so that we can be sure it's actually a low RAM size that is causing the null outputs.
@emmakodes , it means that the model was killed as can be seen from line 1: 2709 Killed
section of the error logs. Killed means that the OS killed the process and exited due to consumption of excess resources.
Detailed error:
Model API eos7pw8:run did not produce an output/home/emma/eos/repository/eos7pw8/20230707025351_2A3EFB/eos7pw8/artifacts/framework/run.sh: line 1:
2709 Killed
/home/emma/miniconda3/envs/eos7pw8/bin/python $1/code/main.py $2 $3
Let's see if the conda downgrade will work at @HellenNamulinda machine. I am in the look out for other solutions for the model.
Thanks all, very good discussion.
It is very annoying that new conda versions are not able to download the syba repo. One workaround, in case you have not found a solution, is to directly download the tar.gz
file of the package with a wget command
Hi @samuelmaina and all,
The model works in my system, but I agree that the fact we need to downgrade conda for it to be able to install Syba is annoying. I'd go for the download of the tar.gz package as @miquelduranfrigola suggests, what do you think? Can you try it out Samuel?
Thanks @GemmaTuron , I have not found any workaround. I will try will try to downloading with .tar.gz
Hi @samuelmaina, While downgrading conda before installing syba works, I think we should explore the option of downloading the .tar.gz as suggested by @miquelduranfrigola and @GemmaTuron. I haven't tried it yet.
If it fails, then maybe we can go with downgrading.
For downgrading, the docker file run commands can be like;
RUN pip install rdkit==2023.03.1
RUN pip install joblib
RUN conda install -n base conda=4.12.0
RUN conda install -c rdkit -c lich syba
RUN conda update -n base -c defaults conda
Locally using --repo_path, π Model eos7pw8 fetched successfully! eos7pw8_fetch_repo.log And it's not giving null outputs again.
hellenah@hellenah-elitebook:~$ conda activate ersilia
(ersilia) hellenah@hellenah-elitebook:~$ ersilia -v fetch eos7pw8 --repo_path Outreachy/eos7pw8 > eos7pw8_fetch_repo.log 2>&1
(ersilia) hellenah@hellenah-elitebook:~$ ersilia serve eos7pw8
π Serving model eos7pw8: syba-synthetic-accessibility
URL: http://127.0.0.1:48821
PID: 11166
SRV: conda
π To run model:
- run
π Information:
- info
(ersilia) hellenah@hellenah-elitebook:~$ ersilia run -i "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1"
{
"input": {
"key": "MLBNXJTXHVBPEC-UHFFFAOYSA-N",
"input": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1",
"text": "FC(F)Oc1ccc(-c2nnc3cncc(Oc4ccc5ccsc5c4)n23)cc1"
},
"output": {
"outcome": [
83.1755432208731
]
}
}
(ersilia) hellenah@hellenah-elitebook:~$ ersilia run -i "CCC"
{
"input": {
"key": "ATUOYWHBWRKTHZ-UHFFFAOYSA-N",
"input": "CCC",
"text": "CCC"
},
"output": {
"outcome": [
11.292091326780156
]
}
}
(ersilia) hellenah@hellenah-elitebook:~$
@HellenNamulinda
The model has been modified to download the syba package from the internet each time. Could you check it works fine and we will be able to close off this model? Thanks!
@GemmaTuron and @samuelmaina,
The changes work but the code on GitHub has some error in the docker file, the command COPY ./repo is supposed to have two arguments( COPY . /repo
) so was causing error; eos7pw8_fetch.log
Dockerfile:18
--------------------
16 | WORKDIR /repo
17 |
18 | >>> COPY ./repo
19 |
20 |
--------------------
ERROR: failed to run Build function: dockerfile parse error on line 18: COPY requires at least two arguments, but only one was provided. Destination could not be determine
@GemmaTuron and @samuelmaina, The changes work but the code on GitHub has some error in the docker file, the command COPY ./repo is supposed to have two arguments(
COPY . /repo
) so was causing error; eos7pw8_fetch.logDockerfile:18 -------------------- 16 | WORKDIR /repo 17 | 18 | >>> COPY ./repo 19 | 20 | -------------------- ERROR: failed to run Build function: dockerfile parse error on line 18: COPY requires at least two arguments, but only one was provided. Destination could not be determine
I will seperate the . /repo thanks for that @HellenNamulinda
The workflows have created the docker images for both ARM and AMD without showing any errors. How did you catch this error(the command you run or the env you are in) @HellenNamulinda ?
This model is ready for testing. If you are assigned to this issue, please try it out using the CLI, Google Colab and DockerHub and let us know if it works!