Closed GemmaTuron closed 11 months ago
Hi @miquelduranfrigola @GemmaTuron
I'm getting an empty output error when I try to fetch the model that I am unsure how to resolve. The model is used for coloring molecules for interaction with CYP3A4, and it's output is a probability. Here's the error code:
12:20:34 | DEBUG | Activation done
12:20:34 | DEBUG | Process id: 30632
12:20:34 | DEBUG | Trying to wake up. Iteration: 0
12:20:34 | DEBUG | Timeout: 1000 Sleep time: 1
12:20:34 | DEBUG | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:34 | DEBUG | No error strings found in temporary file
12:20:34 | DEBUG | Waiting for server
12:20:35 | DEBUG | Trying to wake up. Iteration: 1
12:20:35 | DEBUG | Timeout: 1000 Sleep time: 1
12:20:35 | DEBUG | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:35 | DEBUG | No error strings found in temporary file
12:20:35 | DEBUG | Server logging done
12:20:36 | DEBUG | Trying to wake up. Iteration: 2
12:20:36 | DEBUG | Timeout: 1000 Sleep time: 1
12:20:36 | DEBUG | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:36 | DEBUG | No error strings found in temporary file
12:20:36 | DEBUG | Server is ready. Trying to get URL
12:20:36 | DEBUG | URL found: http://127.0.0.1:56400
12:20:36 | DEBUG | Iterating over APIs
12:20:36 | DEBUG | Running API: predict
12:20:36 | DEBUG | ['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']
12:20:36 | DEBUG | API: predict
12:20:36 | DEBUG | MODEL ID: eos96ia
12:20:36 | DEBUG | SERVICE URL: http://127.0.0.1:56400
12:20:37 | DEBUG | Reading card from eos96ia
12:20:37 | DEBUG | Reading shape from eos96ia
12:20:37 | DEBUG | Input Shape: Single
12:20:37 | DEBUG | Input type is: compound
12:20:37 | DEBUG | Input shape is: Single
12:20:37 | DEBUG | Importing module: .types.compound
12:20:37 | DEBUG | Checking RDKIT and other requirements necessary for compound inputs
12:20:37 | DEBUG | InputShapeSingle shape: Single
12:20:37 | DEBUG | API eos96ia:predict initialized at URL http://127.0.0.1:56400
12:20:37 | DEBUG | Schema not yet available
12:20:37 | INFO | No empty output available
12:20:37 | DEBUG | Meta: None
12:20:37 | DEBUG | Posting to predict
12:20:37 | DEBUG | Batch size 100
12:20:37 | DEBUG | Schema not yet available
12:20:40 | DEBUG | Status code: 500
12:20:40 | ERROR | Status Code: 500
12:20:40 | WARNING | Batch prediction didn't seem to work. Doing predictions one by one...
12:20:43 | DEBUG | Status code: 500
12:20:43 | ERROR | Status Code: 500
12:20:46 | DEBUG | Status code: 500
12:20:46 | ERROR | Status Code: 500
12:20:46 | DEBUG | Schema not yet available
12:20:46 | DEBUG | Done with unique posting
12:20:46 | DEBUG | Metadata needs to be calculated
12:20:46 | ERROR | Meta not available, run some adapations first and it will be inferred atomatically
12:20:46 | DEBUG | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}]
12:20:47 | ERROR | Ersilia exception class:
EmptyOutputError
Detailed error:
Model API eos96ia:predict did not produce an output
I'm working on a Mac with an M1 chip, but I do not believe the issue is related to Mac troubleshooting. I tried fetching from local using the repo_path flag, but I received the same error. Potentially the issue has to do with the API being predict, and not run? I will look into this. Please let me know if I need to provide more information, and if my issue should be reported differently in the future.
Hi @pittmanriley !
The EmptyOutputError is simply indicating that the automated test on 3 random molecules could not be completed:
{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}
Which means the model did not fetch correctly. This is not informative enough. You need to run the -v flag command to print the whole error on screen and also you can save it in a external file, see the troubleshooting instructions for more on that. When you get the whole error log, you'll be able to identify which package is failing
Hi @GemmaTuron,
I'm still having issues getting this to work. After looking at the error code, it doesn't direct me to anywhere in the codebase where I might be able to make adjustments. Here is the output file: output.csv
At the very end of the output, it says Meta not available, run some adapations first and it will be inferred atomatically
. Maybe this has something to do with the error?
@pittmanriley can you please paste the whole error log? otherwise I cannot see what might be failing.
When you get assigned a new model the steps you need to take are:
Hi @GemmaTuron, thank you for these steps! And @miquelduranfrigola, I also thought I'd include you on this issue as well.
I'd like to clarify on my earlier attempts at fetching the model. I have tried fetching the model multiple ways: fetching directly, fetching locally with --repo_path; and using --from_github. Each way gives me an empty output error, which the whole error log can be seen here: output.log
As a result of this error, I have been trying to troubleshoot using the troubleshooting instructions. Today, I was able to make some progress, but I am still stuck. In the troubleshooting steps, it took a while, but I was able to install the four packages needed: rdkit, dgl, dgllife, and PyTorch. Now, when I try to run bash run.sh . /Users/rileypittman/ersilia/test/inputs/compound_single.csv output.csv
I get a runtime error saying it was unable to load state_dict. The error log is here: output.log. I'm not sure how to proceed from here, or what the source of the issue is.
The only potential ideas that I have for this is that I may have newer versions of the packages that are not compatible? For example, the docker file specifies torch 1.4.0, but I have version 2.0.1 installed (and I'm unable to download the older version). I'm not sure if this could contribute to the error I'm getting while troubleshooting, however.
Hi @pittmanriley
Thanks for the explanations. This error indeed seems it could be due to the change of versions: RuntimeError: Error(s) in loading state_dict for MPNNPredictor:
Missing key(s) in state_dict: "gnn.gnn_layer.edge_func.0.weight", "gnn.gnn_layer.edge_func.0.bias", "gnn.gnn_layer.edge_func.2.weight", "gnn.gnn_layer.edge_func.2.bias".
Unexpected key(s) in state_dict: "gnn.gnn_layer.edge_nn.0.weight", "gnn.gnn_layer.edge_nn.0.bias", "gnn.gnn_layer.edge_nn.2.weight", "gnn.gnn_layer.edge_nn.2.bias".
The name of these keys is different between versions? What I suggest here is bringing someone to the team who can install and try this in an older version. @simrantan since you are more or less in the same timezone can you both look at this? @pittmanriley please coordinate with her and try to reproduce the error using an older version of the packages.
Thanks all.
I confirm that the instruction bash run.sh . /Users/rileypittman/ersilia/test/inputs/compound_single.csv output.csv
is the one we should be looking into. Thanks @pittmanriley
The load state_dict error is most likely associated with pytorch. There were big changes between versions 1 and 2 of pytorch, and I am pretty sure this is the source of the error.
Since this is a a pre-trained model, there is no way around it, really: we need to downgrade pytorch to version 1.4.0.
Have you tried this, @pittmanriley?
@pittmanriley
This model is not completed. Please do not move tasks to done yourself.
@GemmaTuron My apologies. It should be good to go now so I submitted the PR.
@miquelduranfrigola I also updated the Docker file so that it installs PyTorch 1.4.0. Thank you.
Hi @pittmanriley There is package conflicts in the test run, please check it.
Hi @GemmaTuron I looked at the failed merge from yesterday, and I saw that there was a problem importing torch. I noticed that I changed the rdkit installation from conda to pip, and that the other installations were still using conda.
This morning, I tried adjusting the Dockerfile to include only pip installations, and when I tried initiating the PR, it failed in the checks when it was installing Ersilia. Is this something I can solve on my end? It mentions this in the check failure: _Warning: Package 'ersilia.hub.content.metadata' is absent from the packages configuration.
Hi @pittmanriley
This seems like an issue with Airtable, which should be fixed now. Until we can debug the py3.11 issue though we cannot run the workflows
Hi @pittmanriley
the workflows are updated! but the PR is failing, please check
Hi @GemmaTuron, I've been trying to work with @simrantan on this model, and I'm not sure how to proceed. I'm still getting the same error that I was a bit ago, which is attached here: eos96ia.log
I've also tried troubleshooting the model by downloading all the packages needed and running run.sh command using bash. However, I get the same exact error that I get when I try to fetch, and I'm still not sure what it means: eos96ia_bash.log
When Simran fetched the model, she got a different error, making me think that my error is Mac related. Her error seems to be some issue with pack.py. Here is here error log: eos96iaerrorlog.txt
Do you know what could be going wrong here?
Since Codespaces is working for me again, I decided to go back to installing the packages with conda, and I was able to get it to fetch within codespaces. I submitted the PR, but I'm getting a new error that Febie was also getting (there's a thread in the internships channel on Slack about it). I think this is headed in the right direction.
Update: I submitted the PR and the checks ended up passing.
Please check that the model is working and refactor it model to the latest eos-template structure. The workflows have already been updated, you can start by checking if the Actions have run successfully or changes need to be made