Closed GemmaTuron closed 1 year ago
Ok thanks @HellenNamulinda , very helpful explanation Do you have the log file of the CLI failing so I can have a look and help you?
Hi @GemmaTuron,
Sure I attached the log file before the update, eos92sw_fetch.log
The model is still failing with the same EmptyOutputError, resulting from ../fpscores.pkl.gz: cannot execute binary file
Since the model env is cloned from the base env, I think there might be some package conflicts. So, I'm fetching the model again, but I added --vv(verbose) on conda commands so I could debug. I will share a new logfile file, or if I figure it out.
Hello @GemmaTuron,
After tracing the conflict in this log file, eos92sw_updated_fetch.log, I realized it was originating from installing rdkit using conda-forge RUN conda install -c conda-forge rdkit
.
After changing it to be installed using pip RUN pip install rdkit-pypi, the model fetched successfully eos92sw_updated1_fetch.log And I was also able to run inferences on the model
{
"input": {
"key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
"input": "CC(=O)NC1=CC=C(O)C=C1",
"text": "CC(=O)NC1=CC=C(O)C=C1"
},
"output": {
"Tox-score": 0.7498964851421736,
"SAscore": 0.6654453269695645
}
}
Plus the file; eml_etoxpred_eos92sw.csv
Now that the model works well, I will go ahead and refactor it.
But I'm going to change the model code a bit so that run.sh won't need three parameters as we see in service.py(python etoxpred_predict.py --datafile {0} --modelfile {1} --outputfile {2}
)
I'm going to make the model file static and remove it from the arguments needed in etoxpred_predict.py,
def myargs():
parser = argparse.ArgumentParser()
parser.add_argument("--datafile", required=True, help="training data filename")
parser.add_argument("--modelfile", required=True, help="path to the model to load")
parser.add_argument(
"--outputfile",
required=False,
default="./results.csv",
help="output file to save the result",
)
args = parser.parse_args()
return args
Hi @GemmaTuron, I made the necessary changes and I created a pull request here
After the changes, I tested the model locally and it worked well. 👍 Model eos92sw fetched successfully! eos92sw_fetch1.log
The model was able to predict when given string inputs
10:21:19 | DEBUG | Done with unique posting
{
"input": {
"key": "RZVAJINKPMORJF-UHFFFAOYSA-N",
"input": "CC(=O)NC1=CC=C(O)C=C1",
"text": "CC(=O)NC1=CC=C(O)C=C1"
},
"output": {
"Tox-score": 0.7498964851421736,
"SAscore": 0.6654453269695645
}
}
And also for file inputs, I used the eml dataset.
eml_etoxpred_eos92sw.csv
eos92sw_predict_file.log
Thanks @HellenNamulinda - I have merged the PR, let's see if it works.
In any case, since we have made changes to the workflows, I will update them afterwards after the new workflows are validated
It seems that we cannot open the pickle file.
I suggest trying to load it directly (from the CLI, do pickle.load
from the same repo)and see if it is a problem with the versions of the packages we have in the conda environment
@HellenNamulinda
Did you manage to test the pickle load?
Hello @GemmaTuron, I tested and was able to load the pickle file, just like before.
I cloned this repo and tested it locally using the below commands(cloned it inside a folder, not the root folder). It fetched successfully, eos92sw_repo_fetch.log 👍 Model eos92sw fetched successfully!
hellenah@hellenah-elitebook:~$ cd Outreachy
hellenah@hellenah-elitebook:~/Outreachy$ conda activate ersilia
(ersilia) hellenah@hellenah-elitebook:~/Outreachy$ git clone https://github.com/ersilia-os/eos92sw.git
Cloning into 'eos92sw'...
remote: Enumerating objects: 98, done.
remote: Counting objects: 100% (98/98), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 98 (delta 36), reused 65 (delta 19), pack-reused 0
Receiving objects: 100% (98/98), 308.07 KiB | 215.00 KiB/s, done.
Resolving deltas: 100% (36/36), done.
Filtering content: 100% (2/2), 57.35 MiB | 1.26 MiB/s, done.
(ersilia) hellenah@hellenah-elitebook:~/Outreachy$ ersilia -v fetch eos92sw --repo_path eos92sw > eos92sw_repo_fetch.log 2>&1
When I cloned the repo into the root folder, this model failed to fetch giving the error we saw; FileNotFoundError: [Errno 2] No such file or directory: './fpscores.pkl.gz'
. eos92sw_repo_fetch_root.log
I cloned to the root folder and fetched like below;
hellenah@hellenah-elitebook:~$ conda activate ersilia
(ersilia) hellenah@hellenah-elitebook:~$ git clone https://github.com/ersilia-os/eos92sw.git
Cloning into 'eos92sw'...
remote: Enumerating objects: 98, done.
remote: Counting objects: 100% (98/98), done.
remote: Compressing objects: 100% (75/75), done.
remote: Total 98 (delta 36), reused 65 (delta 19), pack-reused 0
Receiving objects: 100% (98/98), 308.07 KiB | 418.00 KiB/s, done.
Resolving deltas: 100% (36/36), done.
Filtering content: 100% (2/2), 57.35 MiB | 1.28 MiB/s, done.
(ersilia) hellenah@hellenah-elitebook:~$ ersilia -v fetch eos92sw --repo_path eos92sw > eos92sw_repo_fetch_root.log 2>&1
I did test the repo in the root directory by running the run.sh file, which raised the same error.
(eos92sw) hellenah@hellenah-elitebook:~/eos92sw$ bash model/framework/run.sh model/framework ~/input_15.csv output.csv
...loading models
...starts prediction
Traceback (most recent call last):
File "model/framework/etoxpred_predict.py", line 86, in <module>
outputs = predict(smiles_list)
File "model/framework/etoxpred_predict.py", line 68, in predict
sa_score = reg(smiles_list[i])
File "/home/hellenah/eos92sw/model/framework/sascore.py", line 29, in __call__
self.readFragmentScores()
File "/home/hellenah/eos92sw/model/framework/sascore.py", line 112, in readFragmentScores
_fscores = pickle.load(gzip.open("%s.pkl.gz" % name))
File "/home/hellenah/anaconda3/envs/eos92sw/lib/python3.8/gzip.py", line 58, in open
binary_file = GzipFile(filename, gz_mode, compresslevel)
File "/home/hellenah/anaconda3/envs/eos92sw/lib/python3.8/gzip.py", line 173, in __init__
fileobj = self.myfileobj = builtins.open(filename, mode or 'rb')
FileNotFoundError: [Errno 2] No such file or directory: './fpscores.pkl.gz'
(eos92sw) hellenah@hellenah-elitebook:~/eos92sw$
To me, I knew the issue was because of the relative path instead of an absolute path. So, this code in sascore.py file under frameworks subfolder
global _fscores
# generate the full path filename:
name = os.path.join("./", name)
_fscores = pickle.load(gzip.open("%s.pkl.gz" % name))
I changed it to
global _fscores
# generate the full path filename:
file_path = os.path.abspath(os.path.join(__file__,".."))
name = os.path.join(file_path, name)
_fscores = pickle.load(gzip.open("%s.pkl.gz" % name))
And run.sh worked well, output.csv
(eos92sw) hellenah@hellenah-elitebook:~/eos92sw$ bash model/framework/run.sh model/framework ~/input_15.csv output.csv
...loading models
...starts prediction
...prediction done!
returning output
saving output
saving output done
(eos92sw) hellenah@hellenah-elitebook:~/eos92sw$
Likewise, fetching this model from its repo in the root directory also worked successfully. eos92sw_repo_fetch_root_updated.log
01:13:58 | DEBUG | Repo path specified: eos92sw
01:13:58 | DEBUG | Absolute path: /home/hellenah/eos92sw
⬇️ Fetching model eos92sw: etoxpred
.
.
.
01:18:32 | INFO | Fetching eos92sw done successfully: 0:04:24.263940
Checking setup: 1.290s
Preparing model: 18.917778253555298s
Getting model: 5.072348117828369s
Packing model: 205.6379256248474s
Checking if model needs to be integrated to a tool: 0.0014548301696777344s
Getting model card: 1.755021572113037s
[]
Checking that autoservice works: 6.63561749458313s
Sniffing model: 20.31470251083374s
👍 Model eos92sw fetched successfully!
The model served model runs inferences successfully
01:23:29 | DEBUG | Done with unique posting
{
"input": {
"key": "QTBSBXVTEAMEQO-UHFFFAOYSA-N",
"input": "CC(=O)O",
"text": "CC(=O)O"
},
"output": {
"Tox-score": 0.8160144342015792,
"SAscore": 0.597541979674131
}
}
In conclusion, paths need to be provided as absolute and not relative. This ensures that the file path is consistent regardless of the current working directory. I made a pull request for these changes here
@GemmaTuron, The Model Test on PR failed when installing ersilia, an error related to metadata-generation and Bentmol
AttributeError: 'BentoMLRequirement' object has no attribute '_is_installed'. Did you mean: 'is_installed'?
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: metadata-generation-failed
Although metadata-generation failed, no change was made to the metadata file.
This is an issue with ersilia and not this model repository code.
@HellenNamulinda
You are right this was a temporary bug int he code which should now be fixed. I am rerunning the Model Test action
Hi @GemmaTuron, the build succeeded for both amd64, at 20306
#9 [linux/amd64 2/2] RUN ersilia -v fetch eos92sw --from_github
...
#9 192.1 👍 Model eos92sw fetched successfully!
#9 DONE 193.6s
and arm64 at 22993
#8 [linux/arm64 2/2] RUN ersilia -v fetch eos92sw --from_github
...
#8 8206.1 👍 Model eos92sw fetched successfully!
#8 DONE 8207.6s
I've tested the model on Colab and it works well
Hello @GemmaTuron, This model couldn't work from the go. I tested it on Colab as seen in this notebook and also from CLI using the cloned repo. All were raising an EmptyOutput error. eos92sw_fetch.log
Specifically, cannot execute binary file. The error was related to the line trying to load a pickled file in sascore.py using
pickle.load(gzip.open("%s.pkl.gz" % name))
On installing the exact packages in an environment with python=3.7 and running the code in model/framework, I got an import error related to libboost
After searching for solutions, I changed python=3.7 to python=3.8 and also fixed the incompatibility versions, like
scikit-learn==0.23.2
instead ofscikit-learn==0.23
which installs scikit-learn==0.23.0, and also numpy, matplotlib and pandas. The new commands are;With the new installs, I was able to run the code using
python etoxpred_predict.py --datafile {0} --modelfile {1} --outputfile {2}
But fetching on ersilia CLI is still failing. It's what I will fix next now that I know the code works.