ersilia-os / eos78ao

Mordred physicochemical descriptors
GNU General Public License v3.0
3 stars 3 forks source link

Clean UP & Dockerization eos78ao #5

Closed GemmaTuron closed 1 year ago

pittmanriley commented 1 year ago

Hi @GemmaTuron @miquelduranfrigola,

I'm having some issues fetching this model. If I fetch in CLI or Codespaces, I get this error: out.log

It mentions something about a syntax error near unexpected token `NO', and I'm not sure what this means. I've tried fetching using the --repo_path and --from_github options, but they get me the same error.

pittmanriley commented 1 year ago

I should also add that I've tried troubleshooting by downloading the packages from the docker file manually and running run.sh. When I do this, I get an error saying that the pandas module isn't found:

Traceback (most recent call last):
  File "/Users/rileypittman/eos78ao/model/framework/./code/main.py", line 6, in <module>
    import pandas as pd
ModuleNotFoundError: No module named 'pandas'

I don't understand this because if I try to download pandas in my terminal, it says that the package is already installed.

miquelduranfrigola commented 1 year ago

Thanks @pittmanriley - I was about to ask exactly this. Before running run.sh, have you done conda activate eos78ao? Is pandas installed inside eos78ao environment?

pittmanriley commented 1 year ago

@miquelduranfrigola I did this before, but I just tried doing conda install pandas rather than pip install pandas and it ran this time (I did this within the eos78ao conda environment). However, I reached a long error message after running run.sh regarding an EOF error: out.log

GemmaTuron commented 1 year ago

could be related to Mac: https://stackoverflow.com/questions/67328927/multiprocessing-throwing-runtime-error-after-executing-p-start

Could you try on codespaces?

pittmanriley commented 1 year ago

Hi @GemmaTuron, I was able to run run.sh inside Codespaces and the model works. Does this mean I'm able to start refactoring? Or do I need to get it to fetch before I refactor?

I'm noticing that even after successfully running run.sh in Codespaces, I'm unable to fetch the model, even if I do it using --repo_path. It gives me an empty output error due to a ModuleNotFoundError: No module named 'rdkit' error. In order to download the packages from the docker file (like rdkit, mordred, etc.) I need to do pip install ... instead of conda install ... because the conda command won't work. I think this is causing the issue because when I go to fetch the model, the packages don't actually seem to be installed despite installing them with pip earlier.

Update: I was able to install rdkit using conda install -c rdkit rdkit, but I tried fetching again and it is still saying the rdkit module is not found.

GemmaTuron commented 1 year ago

Hi @pittmanriley If you do not provide the logs of the error we cannot help.

pittmanriley commented 1 year ago

@GemmaTuron Sorry about that. After I individually download each package in Codespaces (rdkit, mordred, and timeout-decorator), I run the model using run.sh and it works. However, if I try fetching the model using --repo_path, I get the following error, despite already downloading each of the packages needed. I downloaded the packages within the conda environment eos78ao, and also tried fetching within the environment, so the packages should be installed. out.csv

GemmaTuron commented 1 year ago

Hi @pittmanriley

It does seem that rdkit is simply not installed - I don't understand the bit "despite already downloading each of the packages needed.". When you fetch a model, it will eliminate any conda environment with the same name and create it anew, so it does not matter which packages you installed previously 07:02:22 | INFO | Deleting conda environment eos78ao If you want to do manual tests I suggest adding a different name to the env, like eos78ao_manual

If the model works when running run.sh on a manually created environment but not at fetch time it indicates a problem with package installation from the dockerfile instructions. I'd bet for the conda install of rdkit. I suggest the following changes in the DockerFile for the conda packages

pip install rdkit==2023.3.2 (latest version)
pip install mordred==1.2
pittmanriley commented 1 year ago

Hi @GemmaTuron, these suggestions worked well. I also added a line in the Dockerfile for installing pandas, since I kept getting an error in Codespaces saying that the module couldn't be found. After making the refactoring changes and Dockerfile changes, I was able to test the model within Codespaces and it worked well, so I opened a PR.

However, the PR failed, but I'm unsure if it is an issue within Ersilia or not. I sent a message in Slack about this, but I am getting a NoneType object is not iterable error. Here is the error in actions: https://github.com/ersilia-os/eos78ao/actions/runs/5524115897/jobs/10075969004?pr=6

GemmaTuron commented 1 year ago

This seems an Ersilia Issue, I am rerunning the workflows