Clean UP & Dockerization eos96ia

GemmaTuron commented 1 year ago

Please check that the model is working and refactor it model to the latest eos-template structure. The workflows have already been updated, you can start by checking if the Actions have run successfully or changes need to be made

pittmanriley commented 1 year ago

Hi @miquelduranfrigola @GemmaTuron

I'm getting an empty output error when I try to fetch the model that I am unsure how to resolve. The model is used for coloring molecules for interaction with CYP3A4, and it's output is a probability. Here's the error code:

12:20:34 | DEBUG    | Activation done
12:20:34 | DEBUG    | Process id: 30632
12:20:34 | DEBUG    | Trying to wake up. Iteration: 0
12:20:34 | DEBUG    | Timeout: 1000 Sleep time: 1
12:20:34 | DEBUG    | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:34 | DEBUG    | No error strings found in temporary file
12:20:34 | DEBUG    | Waiting for server
12:20:35 | DEBUG    | Trying to wake up. Iteration: 1
12:20:35 | DEBUG    | Timeout: 1000 Sleep time: 1
12:20:35 | DEBUG    | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:35 | DEBUG    | No error strings found in temporary file
12:20:35 | DEBUG    | Server logging done
12:20:36 | DEBUG    | Trying to wake up. Iteration: 2
12:20:36 | DEBUG    | Timeout: 1000 Sleep time: 1
12:20:36 | DEBUG    | Temporary file available: /var/folders/1v/6wbcjvrj74sd4lx1041s93zr0000gn/T/ersilia-ub4utaq5/serve.log
12:20:36 | DEBUG    | No error strings found in temporary file
12:20:36 | DEBUG    | Server is ready. Trying to get URL
12:20:36 | DEBUG    | URL found: http://127.0.0.1:56400
12:20:36 | DEBUG    | Iterating over APIs
12:20:36 | DEBUG    | Running API: predict
12:20:36 | DEBUG    | ['CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'C1=CN=CC=C1C(=O)NN']
12:20:36 | DEBUG    | API: predict
12:20:36 | DEBUG    | MODEL ID: eos96ia
12:20:36 | DEBUG    | SERVICE URL: http://127.0.0.1:56400
12:20:37 | DEBUG    | Reading card from eos96ia
12:20:37 | DEBUG    | Reading shape from eos96ia
12:20:37 | DEBUG    | Input Shape: Single
12:20:37 | DEBUG    | Input type is: compound
12:20:37 | DEBUG    | Input shape is: Single
12:20:37 | DEBUG    | Importing module: .types.compound
12:20:37 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
12:20:37 | DEBUG    | InputShapeSingle shape: Single
12:20:37 | DEBUG    | API eos96ia:predict initialized at URL http://127.0.0.1:56400
12:20:37 | DEBUG    | Schema not yet available
12:20:37 | INFO     | No empty output available
12:20:37 | DEBUG    | Meta: None
12:20:37 | DEBUG    | Posting to predict
12:20:37 | DEBUG    | Batch size 100
12:20:37 | DEBUG    | Schema not yet available
12:20:40 | DEBUG    | Status code: 500
12:20:40 | ERROR    | Status Code: 500
12:20:40 | WARNING  | Batch prediction didn't seem to work. Doing predictions one by one...
12:20:43 | DEBUG    | Status code: 500
12:20:43 | ERROR    | Status Code: 500
12:20:46 | DEBUG    | Status code: 500
12:20:46 | ERROR    | Status Code: 500
12:20:46 | DEBUG    | Schema not yet available
12:20:46 | DEBUG    | Done with unique posting
12:20:46 | DEBUG    | Metadata needs to be calculated
12:20:46 | ERROR    | Meta not available, run some adapations first and it will be inferred atomatically
12:20:46 | DEBUG    | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}]
12:20:47 | ERROR    | Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos96ia:predict did not produce an output

I'm working on a Mac with an M1 chip, but I do not believe the issue is related to Mac troubleshooting. I tried fetching from local using the repo_path flag, but I received the same error. Potentially the issue has to do with the API being predict, and not run? I will look into this. Please let me know if I need to provide more information, and if my issue should be reported differently in the future.

GemmaTuron commented 1 year ago

Hi @pittmanriley !

The EmptyOutputError is simply indicating that the automated test on 3 random molecules could not be completed: {'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}

Which means the model did not fetch correctly. This is not informative enough. You need to run the -v flag command to print the whole error on screen and also you can save it in a external file, see the troubleshooting instructions for more on that. When you get the whole error log, you'll be able to identify which package is failing

pittmanriley commented 1 year ago

Hi @GemmaTuron,

I'm still having issues getting this to work. After looking at the error code, it doesn't direct me to anywhere in the codebase where I might be able to make adjustments. Here is the output file: output.csv

At the very end of the output, it says Meta not available, run some adapations first and it will be inferred atomatically. Maybe this has something to do with the error?

GemmaTuron commented 1 year ago

@pittmanriley can you please paste the whole error log? otherwise I cannot see what might be failing.

GemmaTuron commented 1 year ago

When you get assigned a new model the steps you need to take are:

Clone the repository to your local system
Fetch the model using the command --repo_path & collect the log file
If the fetching has failed at this step, identify the source of error and make the necessary modifications. this can mean building manually the environment and adding the dependencies one by one, and then running it from run.sh Please let me know if the error you are getting is at fetch time from GitHub, or from the cloned repository. Also, try running the model manually as explained in the troubleshooting page, which will give you a hint of whether the problem is within the Ersilia integration or the model itself

pittmanriley commented 1 year ago

Hi @GemmaTuron, thank you for these steps! And @miquelduranfrigola, I also thought I'd include you on this issue as well.

I'd like to clarify on my earlier attempts at fetching the model. I have tried fetching the model multiple ways: fetching directly, fetching locally with --repo_path; and using --from_github. Each way gives me an empty output error, which the whole error log can be seen here: output.log

As a result of this error, I have been trying to troubleshoot using the troubleshooting instructions. Today, I was able to make some progress, but I am still stuck. In the troubleshooting steps, it took a while, but I was able to install the four packages needed: rdkit, dgl, dgllife, and PyTorch. Now, when I try to run bash run.sh . /Users/rileypittman/ersilia/test/inputs/compound_single.csv output.csv I get a runtime error saying it was unable to load state_dict. The error log is here: output.log. I'm not sure how to proceed from here, or what the source of the issue is.

The only potential ideas that I have for this is that I may have newer versions of the packages that are not compatible? For example, the docker file specifies torch 1.4.0, but I have version 2.0.1 installed (and I'm unable to download the older version). I'm not sure if this could contribute to the error I'm getting while troubleshooting, however.

GemmaTuron commented 1 year ago

Hi @pittmanriley

Thanks for the explanations. This error indeed seems it could be due to the change of versions: RuntimeError: Error(s) in loading state_dict for MPNNPredictor:

    Missing key(s) in state_dict: "gnn.gnn_layer.edge_func.0.weight", "gnn.gnn_layer.edge_func.0.bias", "gnn.gnn_layer.edge_func.2.weight", "gnn.gnn_layer.edge_func.2.bias". 
    Unexpected key(s) in state_dict: "gnn.gnn_layer.edge_nn.0.weight", "gnn.gnn_layer.edge_nn.0.bias", "gnn.gnn_layer.edge_nn.2.weight", "gnn.gnn_layer.edge_nn.2.bias".

The name of these keys is different between versions? What I suggest here is bringing someone to the team who can install and try this in an older version. @simrantan since you are more or less in the same timezone can you both look at this? @pittmanriley please coordinate with her and try to reproduce the error using an older version of the packages.

miquelduranfrigola commented 1 year ago

Thanks all.

I confirm that the instruction bash run.sh . /Users/rileypittman/ersilia/test/inputs/compound_single.csv output.csv is the one we should be looking into. Thanks @pittmanriley

The load state_dict error is most likely associated with pytorch. There were big changes between versions 1 and 2 of pytorch, and I am pretty sure this is the source of the error.

Since this is a a pre-trained model, there is no way around it, really: we need to downgrade pytorch to version 1.4.0.

Have you tried this, @pittmanriley?

GemmaTuron commented 1 year ago

@pittmanriley

This model is not completed. Please do not move tasks to done yourself.

pittmanriley commented 1 year ago

@GemmaTuron My apologies. It should be good to go now so I submitted the PR.

@miquelduranfrigola I also updated the Docker file so that it installs PyTorch 1.4.0. Thank you.

GemmaTuron commented 1 year ago

Hi @pittmanriley There is package conflicts in the test run, please check it.

pittmanriley commented 1 year ago

Hi @GemmaTuron I looked at the failed merge from yesterday, and I saw that there was a problem importing torch. I noticed that I changed the rdkit installation from conda to pip, and that the other installations were still using conda.

This morning, I tried adjusting the Dockerfile to include only pip installations, and when I tried initiating the PR, it failed in the checks when it was installing Ersilia. Is this something I can solve on my end? It mentions this in the check failure: _Warning: Package 'ersilia.hub.content.metadata' is absent from the packages configuration.

GemmaTuron commented 1 year ago

Hi @pittmanriley

This seems like an issue with Airtable, which should be fixed now. Until we can debug the py3.11 issue though we cannot run the workflows

GemmaTuron commented 12 months ago

Hi @pittmanriley

the workflows are updated! but the PR is failing, please check

pittmanriley commented 12 months ago

Hi @GemmaTuron, I've been trying to work with @simrantan on this model, and I'm not sure how to proceed. I'm still getting the same error that I was a bit ago, which is attached here: eos96ia.log

I've also tried troubleshooting the model by downloading all the packages needed and running run.sh command using bash. However, I get the same exact error that I get when I try to fetch, and I'm still not sure what it means: eos96ia_bash.log

When Simran fetched the model, she got a different error, making me think that my error is Mac related. Her error seems to be some issue with pack.py. Here is here error log: eos96iaerrorlog.txt

Do you know what could be going wrong here?

pittmanriley commented 12 months ago

Since Codespaces is working for me again, I decided to go back to installing the packages with conda, and I was able to get it to fetch within codespaces. I submitted the PR, but I'm getting a new error that Febie was also getting (there's a thread in the internships channel on Slack about it). I think this is headed in the right direction.

Update: I submitted the PR and the checks ended up passing.

ersilia-os / eos96ia

Clean UP & Dockerization eos96ia #1