Clean UP & Dockerization eos1af5

GemmaTuron commented 1 year ago

Please check that the model is working and refactor it model to the latest eos-template structure. The workflows have already been updated, you can start by checking if the Actions have run successfully or changes need to be made

febielin commented 1 year ago

Just made a pull request!

miquelduranfrigola commented 1 year ago

@GemmaTuron please check

GemmaTuron commented 1 year ago

@febielin

The model test action has failed, please revise and amend. There is an arg that is missing / should not be passed

febielin commented 1 year ago

I see that this one was not successful. I believe the cause of this problem is that I modified the service.py file to take one less argument so as to mach the eos template bash {0}/run.sh {0} {1} {2}. I see that this was a mistake, and I will write back third argument bash {0}/run.sh {0} {1} {2} {3} to take self.checkpoints_dir.

GemmaTuron commented 1 year ago

Hi @febielin

To follow the current structure of the eos-template, I'd suggest coding the path to the checkpoints in the main.py file rather than passing it as an argument to the run.sh, what do you think?

febielin commented 1 year ago

Hi @GemmaTuron,

Sorry for not giving you an update on this! I have taken your advice, so that should resolve the issue of the expected third argument.

Currently working out fetching problems for this model.

febielin commented 1 year ago

Hi @GemmaTuron,

I believe I've made some progress with this.

The Dockerfile requires Python v3.7, but I consistently have trouble installing dgllife=0.2.3. Instead, I tried creating a conda env in Python v3.8, as you suggested, with the following packages:

rdkit-2023.3.2
dglteam dgl=0.4.3post2
dgllife (latest version)
torch (latest version)

If I am right, this has resolved the package dependency issue.

[UPDATE] Turns out that this issue of dgl installation that a couple models have is widely shared, as shown here.

With this information, we could ALSO maintain the python v3.7 conda env and run the installs as specified in the Dockerfile, in addition to the manual install of pip install dgl. There are still some package dependency issues, however, which can be resolved by:

pip install torch_geometric
pip install torch torchvision I did have to make slight modifications to some import statements to make this work. My question is, would this be a drawback?

But eventually both solutions lead to the same error.

I am now going to troubleshoot the code-level issues. At first glance, it looks like it is related to yesterday's session involving inchikeys. Specifically, there's an issue with converting from SMILES representation to InChI representation. I believe it's registering as NoneType for some reason.

GemmaTuron commented 1 year ago

Hi @febielin Which changes did you make in the import statements? Can I see the code? (you can push it to your fork). Did you change the rdkit version? Indeed the error seems at the level of MolFromSmiles conversion, which is giving a None output. Which was the input for this particular error?

febielin commented 1 year ago

Hi @GemmaTuron,

In regards to the import statements, I'd like to clarify. The changes I made in the conda python 3.7 env were in the package code itself, not the model. I ask if this is a drawback, because I imagine users who download this model shouldn't have to worry about making such modifications to packages. Also, no changes were made in the rdkit version (still as specified in Dockerfile).

Since the changes were not made in the model itself, I am not sure if this warrants a push (please correct me if I'm wrong, I'm just unsure how I would push this to my forked repo). I will detail the changes for you, if that's okay. The changes include the following (important changes in bold):

In dgllife:

from dgl.nn.pytorch import GATv2Conv --> from torch_geometric.nn.conv import GATv2Conv (dgllife/model/gnn/gatv2.py)
from torch.nn.pytorch.softmax import edge_softmax --> from dgl.nn.pytorch.softmax import edge_softmax (dgllife/model/gnn/pagtn.py)

In torch_geometric:

(+) import torchvision.models as models (torch_geometric/profile/profile.py)

Apologies because I know this format is difficult to read. Let me know what I can do to make this more clear.

In terms of the input that triggered this particular Nonetype error, it originated from a file of 20 smiles that I made. An identical error arises when running the eml_canonical file as input. I've tried to do some more testing, and interestingly, when I use compound_singles.csv in ersilia/ersilia/test/inputs, I actually get a valid result. I've linked the output file here. My next step is to try to understand in which specific case the None output gets produced.

@GemmaTuron What are your thoughts on this? Admittedly, this has been a very tricky and tedious model for me, but I'm doing my best to dissect the error messages and get the model functioning. Please let me know if I am way off the mark here.

GemmaTuron commented 1 year ago

Hi @febielin

Indeed, we should not change anything in the conda packages, because these changes are only reflected in YOUR version of the package, not the one users will download. Once you fork a repo, and you clone it, anything you push will be pushed to the forked repo, and from there you can open a PR. On the contrary, if you are working on a clone of the main repository, your push would open a PR, but would not be very convenient for us to test the model .

We need to sort out the dgl issue, even if it is going back to a previous version. Since @pittmanriley is also experiencing similar issues in eos96ia, could you two and @simrantan work together on this? I think Simran has the windows install that would allow to test older versions of the packages

febielin commented 1 year ago

Update: Riley shared with me that the dgl issue has to do with Mac devices. If I test this model out on codespaces, I find that this is true. There are no dgl (or any other dependency) issues.

GemmaTuron commented 1 year ago

Hi @febielin

Have you been able too coordinate with @simrantan who does not use a Mac? If the model is refactored and works, she could test it and then we can open a PR for it

febielin commented 1 year ago

Hi @GemmaTuron,

Simran and I are planning on huddling today to test this model. Because I was able to run the model on codespaces successfully without ant dependency issues, my guess is that it will also function well on Simran's computer. We will keep you updated on this.

In the meantime, I've been looking over the MolFromSmiles error that was mentioned in an earlier comment. As a reminder, the input that produced this error was the eml_canonical file. The error is in this line of code: mols += [Chem.MolToInchi(Chem.MolFromSmiles(r[0]))

I used a print statement print(r[0]) to figure out what's getting passed into the .MolFromSmiles function, and it appears that the drug name is getting passed in (instead of a SMILES string), which is probably why we are getting a NoneType error. This is a very similar error log, but you can see the drug "abacavir" being printed. Here's the snippet of the code, in case you'd also like to review that.

This probably explains why the eml_canonical file fails and compound_singles.csv in ersilia/ersilia/test/inputs doesn't. The next step from here is to modify the code so that the function only takes in the SMILES strings in eml_canonical. I'd like to ask for assistance in this task, because I honestly find myself a little lost in the intricacies of the model code. Please let me know if I can arrange a huddle with you and/or Miquel.

febielin commented 1 year ago

Also, I tried testing this on my file of 20 smiles, and I get this error. This seems entirely unrelated to the Nonetype error, but also something worth mentioning. However, I do see the actual SMILES strings are getting printed which is a good sign.

GemmaTuron commented 1 year ago

You pressed some key?

KeyboardInterrupt
^[[A%

GemmaTuron commented 1 year ago

Hi @GemmaTuron,

Simran and I are planning on huddling today to test this model. Because I was able to run the model on codespaces successfully without ant dependency issues, my guess is that it will also function well on Simran's computer. We will keep you updated on this.

In the meantime, I've been looking over the MolFromSmiles error that was mentioned in an earlier comment. As a reminder, the input that produced this error was the eml_canonical file. The error is in this line of code: mols += [Chem.MolToInchi(Chem.MolFromSmiles(r[0]))

I used a print statement print(r[0]) to figure out what's getting passed into the .MolFromSmiles function, and it appears that the drug name is getting passed in (instead of a SMILES string), which is probably why we are getting a NoneType error. This is a very similar error log, but you can see the drug "abacavir" being printed. Here's the snippet of the code, in case you'd also like to review that.

This probably explains why the eml_canonical file fails and compound_singles.csv in ersilia/ersilia/test/inputs doesn't. The next step from here is to modify the code so that the function only takes in the SMILES strings in eml_canonical. I'd like to ask for assistance in this task, because I honestly find myself a little lost in the intricacies of the model code. Please let me know if I can arrange a huddle with you and/or Miquel.

This should be handled by Ersilia, @miquelduranfrigola did you change something on the code?

febielin commented 1 year ago

@GemmaTuron

Yes I did a keyboard interrupt. In my experience, it just stalls and never returns. What I could do is just let it run in the background today and see if I ever get an output.

GemmaTuron commented 1 year ago

Hi @febielin Yes, if it is not crashing on its own try a longer run on the background. Regarding getting the name of the drugs instead of the SMILES is very surprising, does it happen only in codespaces? when you work in the model with Simran please check if she finds the same

febielin commented 1 year ago

@GemmaTuron Will do. The 20 smiles input is currently running. Simran and I will be meeting shortly, and I will check if she also gets this same error of the drug name instead of the SMILES. By the way, all of this is running on CLI at the moment, not codespaces, but yes, I have tried and I do in fact get the same error on codespaces.

febielin commented 1 year ago

@GemmaTuron

Okay, tried the 20 smiles input on codespaces, and this intraoperative threads error is resolved. Upon closer inspection of the error, I saw that it is likely just another package issue involving both dgl and torch. Will also ask Simran to test this theory. I now believe that eos1af5 is currently able to process all forms of SMILES string inputs except for eml_canonical.csv (due to the presence of drug names in that file).

I can't help but wonder, is there any chance this is related to the eos7ack issue? The error messages are certainly different, but the behavior is much the same: processes all inputs except eml_canonical.

febielin commented 1 year ago

@GemmaTuron

The model did not work for Simran. Here is the error log she provided me.

The first error we encountered was in this line (line 539): ModuleNotFoundError: No module named 'torch'.

I asked Simran to do pip install torch (line 540), thinking maybe this is something we need to add to the Dockerfile. This resulted in a new error (line 591): ImportError: /lib/x86_64-linux-gnu/libstdc++.so.6: version 'GLIBCXX_3.4.29' not found (required by /home/simran/miniconda3/envs/eos1af5/lib/python3.7/site-packages/scipy/spatial/ckdtree.cpython-37m-x86_64-linux-gnu.so) I am not sure what to make of this error.

I attempted a PR so I could see the workflow run, and the error is identical to the torch error.

As an aside, I apologize in advance for doing multiple PR's. I am often uncertain if the errors I encounter are only on my end, and seeing the workflows run provide me with more answers.

febielin commented 1 year ago

(FYI the 20 smiles input is still running and it's been 8 hours... maybe I'll let it continue through the night)

GemmaTuron commented 1 year ago

This error looks like a conda installation issue on @simrantan side, have a look at various issues open around this if you search on google: https://stackoverflow.com/questions/73317676/importerror-usr-lib-aarch64-linux-gnu-libstdc-so-6-version-glibcxx-3-4-30

About the torch install, is this package required? was it specified previously or not? I see a lot of conflicts in the Action that failed, have you checked that? How are you running the model with the eml file, did you not encounter these conflicts?

to be incompatible with the existing python installation in your environment:

Specifications:

  - brotlipy -> python[version='>=2.7,<2.8.0a0|>=3.5,<3.6.0a0']
  - defaults/linux-64::pip==22.3.1=py37h06a4308_0 -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.9,<3.10.0a0|>=3.8,<3.9.0a0|>=3.11,<3.12.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0']
  - defaults/linux-64::setuptools==65.6.3=py37h06a4308_0 -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.11,<3.12.0a0|>=3.8,<3.9.0a0|>=3.9,<3.10.0a0|>=3.6,<3.7.0a0|>=3.5,<3.6.0a0']
  - py-boost -> python[version='>=3.11,<3.12.0a0']
  - python-dateutil -> python[version='>=3.8,<3.9.0a0|>=3.9,<3.10.0a0']
  - pytorch=1.4.0 -> python[version='>=2.7,<2.8.0a0|>=3.10,<3.11.0a0|>=3.11,<3.12.0a0|>=3.9,<3.10.0a0|>=3.5,<3.6.0a0']

GemmaTuron commented 1 year ago

I think these issues arise because you did tests on py3.10 but you are pushing a version with py3.7, from what I understand in this commit: https://github.com/ersilia-os/eos1af5/pull/5/commits/503b5c31365bfa6d04b2e6964b88c8df50de9697

febielin commented 1 year ago

Following up on Riley's presentation on model eos96ia, I see that this model shares the same Dockerfile. I figured I could resolve my issue by using Riley's method.

Here's the situation–I am able to fetch eos96ia from Dockerhub on my device. However, because Dockerhub does not have the latest version of eos1af5, this model fails to fetch. I have tried doing this on Codespaces as well (error log) and I get an EmptyOutput error, likely because torch is not installing.

GemmaTuron commented 1 year ago

Can you try @emmakodes test.yml files that he showed today with the updated Dockerfile to see if it would work?

febielin commented 1 year ago

@emmakodes Yes, checkbuild which you showed today in the lab meeting seems very useful. Is it currently ready to be used? I would love to test it out on this model since I have so much trouble testing models locally.

@GemmaTuron Okay, I have once again replicated the MolFromSmiles error. I am currently working on codespaces. I manually ran each package install specified in the Dockerfile, and I ran it externally with bash run.sh.

I have trouble understanding why I run into package issues only when I attempt to fetch through Ersilia, but when I run the model through bash run.sh, I do not get package issues.

GemmaTuron commented 1 year ago

Hi @febielin

In the log of the error I see the following:

Boost.Python.ArgumentError: Python argument types in
    rdkit.Chem.rdinchi.MolToInchi(NoneType, str)
did not match C++ signature:

So, it does not seem an issue with installing packages rather than the molecules you are passing from predict. Could you compare the rdkit versions you have in the environment created by Ersilia and the environment you create manually?

emmakodes commented 1 year ago

@emmakodes Yes, checkbuild which you showed today in the lab meeting seems very useful. Is it currently ready to be used? I would love to test it out on this model since I have so much trouble testing models locally.

@GemmaTuron Okay, I have once again replicated the MolFromSmiles error. I am currently working on codespaces. I manually ran each package install specified in the Dockerfile, and I ran it externally with bash run.sh.

I have trouble understanding why I run into package issues only when I attempt to fetch through Ersilia, but when I run the model through bash run.sh, I do not get package issues.

Sure, @febielin you can test it out. I have written a guide on how to use it on the README. If you encounter any challenges while trying to use it you can let me know checkbuild

GemmaTuron commented 1 year ago

@febielin Is this model ready to be tested?

febielin commented 1 year ago

@GemmaTuron

Yes, this model is ready to be tested.

GemmaTuron commented 1 year ago

@febielin

Until we decide if we incorporate the local workflows in the template, we said we would not merge them to the main model branch. Please delete them

febielin commented 1 year ago

@GemmaTuron

Please review PR https://github.com/ersilia-os/eos1af5/pull/8. Thank you.

emmakodes commented 1 year ago

Hello, @GemmaTuron I couldn't find any hint or reason from the Workflow files or Actions tab of this model as to why the Create a Test issue step in test-model.yml is skipped.

Also, I'm not sure if @febielin pushing a PR containing the local test workflows caused Create a Test issue step to be skipped. If you look at the local test workflows, they never ran local-test-model.yml local-test-build.yml since @febielin didn't use local test as the commit message to trigger them before opening the PRs

GemmaTuron commented 1 year ago

hmm thanks for looking into this @emmakodes ! LEt's consider it an isolated thing and move on then!

ersilia-os / eos1af5

Clean UP & Dockerization eos1af5 #1