Closed GemmaTuron closed 5 months ago
Hello @pittmanriley , thanks.
The information.json
file is created automatically at fetch time, so you should not add it.
I have tried the model like this: ersilia -v fetch eos8d8a --from_github
and it worked perfectly for me. To go step by step, can you maybe first try this?
@miquelduranfrigola
This worked. Thanks so much!
OK - so what is happening is that, without the --from_github
option, the model is fetched either from S3 or DockerHub, and this may be fetching old versions of the model.
Let's see if you can now proceed...
@pittmanriley
the first thing to do when you get assigned a new model is fork the repo, clone the fork in your local system and then try out the fetch using the --repo_path option. Do not fetch it directly as anyway you need the repository cloned to make changes.
a reminder again, the models get fetched first from docker, then from s3 and then from github. The docker versions won't be correct in most occasions if it is a non refactored model, so when you get assigned a new model please follow the steps above.
Thank you for your help @GemmaTuron!
I tested the model in CLI, Google Colab, and Docker. This model predicts the potential to permeate the Mycobacterium tuberculosis cell membrane based on physicochemical properties, so its output is a probability. Here are the results in the CLI using compound_list.csv:
{
"input": {
"key": "LUHMMHZLDLBAKX-UHFFFAOYSA-N",
"input": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O",
"text": "CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O"
},
"output": {
"probability": 0.097
}
}
{
"input": {
"key": "QRXWMOHMRWLFEY-UHFFFAOYSA-N",
"input": "C1=CN=CC=C1C(=O)NN",
"text": "C1=CN=CC=C1C(=O)NN"
},
"output": {
"probability": 0.191
}
}
{
"input": {
"key": "SGOIRFVFHAKUTI-UHFFFAOYSA-N",
"input": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O",
"text": "CC(CN1C=NC2=C(N=CN=C21)N)OCP(=O)(O)O"
},
"output": {
"probability": 0.044
}
}
And here are the results using Google Colab on the eml_canonical.csv file: eos8d8a_output.csv
The model also worked in Docker:
# ersilia run -i "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
{
"input": {
"key": "NQQBNZBOOHHVQP-UHFFFAOYSA-N",
"input": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]",
"text": "C1=C(SC(=N1)SC2=NN=C(S2)N)[N+](=O)[O-]"
},
"output": {
"probability": 0.325
}
}
Is there any change needed in the model or as it is is fine?
@GemmaTuron I don't think any changes are needed, unless you think that it's an issue I had difficulty running it by fetching normally? Once I fetched using ersilia -v fetch eos8d8a --from_github
, everything worked as expected.
Hello @pittmanriley
This model was incorporated two years ago and it does not have the required current eos-template structure, including a main.py, a run.sh etc. Hence it needs to be refactored completely, creating the necessary files, allocating the model checkpoints where they correspond and so on. If it is not clear what we mean by "refactoring the model" please write to me in Slack.
This makes sense @GemmaTuron. I'll get the refactoring done soon.
Also, I wanted to write about a potential issue with this model that I discussed with @miquelduranfrigola since it is only available in AMD64. To fetch the model, I used the command ersilia -v fetch eos8d8a --from_github
, which worked. Since this command will not show the image on docker, I had to run an additional command to do so:
docker pull ersiliaos/eos8d8a:latest --platform linux/amd64
These steps work, but I think it is important for a user to know if these are the steps required, since they may also have an M1 or M2 chip.
I also wanted to leave an update regarding the refactoring of this model. I deleted the .dvcignore file, adjusted the .gitattributes and service.py files, created a folder called code where I plan to add a main.py file, and added a run.sh file to the framework folder. To create the main.py file, I know I want to combine the two files in the framework folder: reorder_predictions.py and calculate_descriptors.py. However, I tried doing this, and ended up testing the changes and got an index out of range error related to a call in main.py where I define outfile = sys.argv[3]
. I pushed the changes onto my forked version for now, and will continue to work on this. I still need to make adjustments to the service.py file, but this may be difficult.
Hi @pittmanriley
Thanks for the update, a summary of some things you can look into:
"bash {0}/calculate_descriptors.py {1} {2}".format(
self.framework_dir,
data_file,
feat_file
),
"bash {0}/mycpermcheck1.1 -i {1} > {2}".format(
self.framework_dir,
feat_file,
pred_unsorted_file
),
"bash {0}/reorder_predictions.py {1} {2} {3}".format(
self.framework_dir,
feat_file,
pred_unsorted_file,
pred_file
)
]
But you have eliminated two of them? And the way of calling scripts: python - for .py files bash - for .sh files perl - for perl files
I'd adivse you to rebase your changes to the original version of the repository and think again on the logic behind the original commands on the service file:
lines = [
"python {0}/calculate_descriptors.py {1} {2}".format(
self.framework_dir,
data_file,
feat_file
),
"perl {0}/mycpermcheck1.1 -i {1} > {2}".format(
self.framework_dir,
feat_file,
pred_unsorted_file
),
"python {0}/reorder_predictions.py {1} {2} {3}".format(
self.framework_dir,
feat_file,
pred_unsorted_file,
pred_file
)
]
I hope this pointer is helpful. See that we first calculate the descriptors, then run the actual model which is in a perl file and finally reorder the predictions to make sure we get the right output. Try to first understand these steps clearly before moving onto refactoring it
Hi @GemmaTuron, I was advised by Miquel to only make some basic changes to this model for now. Specifically, only change the predict api in the service.py file to run and delete the unnecessary .dvc files, and revisit the model later on in the summer to make the rest of the changes. The reason is because the changes would be fairly complicated and the model currently works well. Please let me know if this is okay. If so, I'm ready to submit the PR.
Hi @pittmanriley
Yes, w ehave been dicussing with Miquel about this. Open a PR for the small changes, make sure to revise the .gitattributes file accordingly!
Good to close this issue @GemmaTuron?
Hi @GemmaTuron,
I'm having issues testing this model on CLI. It says that I'm getting a FileNotFoundError:
However, I'm not sure why this is trying to use/open a file called information.json. Is it normal for a model to have an information.json file, and this model just doesn't have one? I apologize if this post is vague, but I'm really not sure what direction to take from here. Thank you.