Closed paulinebanye closed 1 year ago
Hi @pauline-banye
Thanks, very detailed answer super helpful Just to be clear: the check fails in Git Actions, but if you run it in your computer with the --repo-path flag, does it work?
Hi @GemmaTuron, No it does not. I get errors when I run it with the repopath flag. `ersilia -v fetch eos74bo -r /mnt/c/Users/DELL-PC/Desktop/eos74bo/ > eos74bo.log 2>&1`.
Steps to recreate the error
@GemmaTuron @DhanshreeA The build progressed much further with docker dependencies. The error returned currently is related to the relative imports path within the repo. This is an issue that occured when I was working with the RLM model initially, but it was resolved by including sys.path.append or sys.path.insert within the codebase. I added those as soon as I began working with the solubility model.
FROM bentoml/model-server:0.11.0-py37
MAINTAINER ersilia
RUN pip install rdkit RUN pip install pandas RUN pip install numpy RUN pip install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install tqdm RUN pip install typing-extensions RUN pip install typed-argument-parser RUN pip install tensorboardX RUN pip install scikit-learn RUN pip install hyperopt RUN pip install requests
WORKDIR /repo COPY . /repo
I am at a loss to understand why it is not working when I test the model within the Ersilia CLI. I am still attempting to debug and figure out what is causing the errors.
[eos74bo_6.log](https://github.com/ersilia-os/eos74bo/files/10712735/eos74bo_6.log)
[eos74bo_8.log](https://github.com/ersilia-os/eos74bo/files/10712736/eos74bo_8.log)
Hello @pauline-banye .
I see in the logs that some modules are not being imported, to do this make sure from your development environment in the code that the imports are being done correctly, that this is the correct path, check that it really is in the folder that corresponds to those . py, for example "predictors", it seems that it is trying to access that folder to import "Solubility predictor", but it cannot find it.
Check if you have a __init__.py
file and if you have those imports configured there.
And try to run the main.py file, as follows:
If you already have an environment configured with the model dependencies installed. Activate the environment and go directly to the main.py file path and run it.
When you have solved it you can finally run it with ersilia and repo_path like this:
activate the ersilia environment and run:
ersilia -v fetch "model_id" --repo_path /home/../model_id
I recommend as a good practice, in the docker file add the versions of each dependency.
Hello @pauline-banye . I see in the logs that some modules are not being imported, to do this make sure from your development environment in the code that the imports are being done correctly, that this is the correct path, check that it really is in the folder that corresponds to those . py, for example "predictors", it seems that it is trying to access that folder to import "Solubility predictor", but it cannot find it. Check if you have a
__init__.py
file and if you have those imports configured there.And try to run the main.py file, as follows: If you already have an environment configured with the model dependencies installed. Activate the environment and go directly to the main.py file path and run it. When you have solved it you can finally run it with ersilia and repo_path like this: activate the ersilia environment and run:
ersilia -v fetch "model_id" --repo_path /home/../model_id
I recommend as a good practice, in the docker file add the versions of each dependency.
Thank you so much @carcablop. I really appreciate your help checking the model out.
Hi @GemmaTuron, The issues I kept receiving were due to the relative paths. Majority of them were within the chemprop submodule and I have been able to resolve them. I tested the repo and the model was fetched successfully.
ersilia -v fetch eos74bo -r /mnt/c/Users/DELL-PC/Desktop/nu_eos74bo/eos74bo > eos74bo.log 2>&1
eos74bo.log
That sounds great @pauline-banye ! Could you also describe the issues you faced with chemprop, we can all learn from it and this can be useful information around model incorporation for future contributors.
Hi @pauline-banye ! Great progress thanks for the detailed feedback. Could you:
Sure @GemmaTuron @DhanshreeA, @carcablop !
The issues were mainly related to the way the relative imports in the chemprop submodule were specified in the repository. I kept getting different errors , which prompted me to start debugging each file.
[x] In some instances I had to use from .. import <module name>
to reference the parent directory of that specific module and retrieve specific functions or classes. For example, the model/framework/predictors/chemprop/chemprop/data/data.py file required some classes from a different folder, and the way it was imported returned a module not found error.
I ended up converting the import from chemprop.features import BatchMolGraph, MolGraph
to from .. features import BatchMolGraph, MolGraph
. This instructed the code to cd out of the data directory into the model/framework/predictors/chemprop/chemprop/features/init.py file and import the required classes.
[x] In instances where I had a directory with different modules, I ended up refrencing each module with from . import <module name>
to extract specific functions. For example in the _predictors/chemprop/chemprop/interpret.py, there were several imports which were referenced in a way that returned the module not found error.
from chemprop.args import InterpretArgs
from chemprop.data import MoleculeDataLoader, MoleculeDataset
from chemprop.data.utils import get_data_from_smiles, get_header, get_smiles
from chemprop.train import predict
from chemprop.utils import load_args, load_checkpoint, load_scalers
To fix those errors, I edited the imports using the . notation
from . args import InterpretArgs
from . data import MoleculeDataLoader, MoleculeDataset
from . data.utils import get_data_from_smiles, get_header, get_smiles
from . train import predict
from . utils import load_args, load_checkpoint, load_scalers
.....
[x] In other instances, I made use of the sys.path.append or sys.path.insert option. In the main.py file for example, I needed to import SolubilityPredictor from the solubility_predictor in the solubility directory.
root = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(root, ".."))
from predictors.solubility.solubility_predictor import SolubilityPredictor
root = os.path.dirname(os.path.abspath(__file__))
to set the absolute path of the code directory containing the main.py file...
which instructed the code to cd out of the code folder before specifying the relative path (predictors.solubility.solubility_predictor) to the SolubilityPredictor class.[x] The function which required FPSim2 was located in the utilities/utilities.py file. From my understanding, this code takes the list of kekule_smiles and the model, then calculates the Tanimoto similarity between the input smiles and returns a list of similarity values and the time taken to complete the similarity calculation.
def get_similar_mols(kekule_smiles: list, model: str):
start = time.time()
sim_vals = []
fp_dict_path = ''.join(['../train_data/', model, '.h5'])
fp_dict_path = path.abspath(path.join(os.getcwd(), fp_dict_path))
fp_engine = FPSim2Engine(fp_dict_path)
for smi in kekule_smiles:
res = fp_engine.on_disk_similarity(smi, 0.01)
sim_vals.append(res[0][1])
end = time.time()
print(f'{end - start} seconds to calculate Tanimoto similarity for {len(kekule_smiles)} molecules')
return sim_vals
Hi @GemmaTuron
I tested the model via the repo_path method.
Although the model functioned as expected, I am experiencing the issue with the repo metadata. It returns a Wrong Ersilia model tag
error. However I did not edit any of the items in the tag variable.. I have been reviewing the metadata but I haven't been able to figure out what could be causing this error.
@DhanshreeA @carcablop would you mind taking a look at it?
--2023-02-12 18:10:45-- https://raw.githubusercontent.com/ersilia-os/ersilia/master/.github/scripts/update_metadata_to_airtable.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 536 [text/plain]
Saving to: ‘update_metadata_to_airtable.py’
0K 100% 40.6M=0s
2023-02-12 18:10:46 (40.6 MB/s) - ‘update_metadata_to_airtable.py’ saved [536/536]
18:10:47 | DEBUG | Reading from https://raw.githubusercontent.com/pauline-banye/eos74bo/main/metadata.json
18:10:47 | ERROR | Ersilia exception class:
TagBaseInformationError
Detailed error:
Wrong Ersilia model tag
Hints:
Tags must be in list format and they must be accepted our team. This means that only tags that are already available in Ersilia are allowed. If you want to include a new tag, please open a pull request (PR) on the 'tag.txt' file from the Ersilia repository.
Traceback (most recent call last):
File "/home/runner/work/eos74bo/eos74bo/update_metadata_to_airtable.py", line 14, in <module>
data = rm.read_information(org=user_name, branch=branch)
File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 388, in read_information
bi.from_dict(data)
File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 342, in from_dict
self.tag = data["Tag"]
File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 238, in tag
raise TagBaseInformationError
ersilia.utils.exceptions_utils.card_exceptions.TagBaseInformationError: Ersilia exception class:
TagBaseInformationError
Detailed error:
Wrong Ersilia model tag
Hints:
Tags must be in list format and they must be accepted our team. This means that only tags that are already available in Ersilia are allowed. If you want to include a new tag, please open a pull request (PR) on the 'tag.txt' file from the Ersilia repository.
Error: Process completed with exit code 1.
{
"Identifier": "eos74bo",
"Slug": "aqueous-kinetic-solubility",
"Status": "In progress",
"Title": "Aqueous Kinetic Solubility",
"Description": "Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.",
"Mode": "Pretrained",
"Task": ["Classification"],
"Input": ["Compound"],
"Input Shape": "Single",
"Output": ["Probability"],
"Output Type": ["Float"],
"Output Shape": "Single",
"Interpretation": "Probability of a compound being soluble at 10 μg/mL. (>0.5: Soluble), and probability of a compound being highly soluble (>52 μg/mL; >0.5: Soluble)",
"Tag": [
"solubility",
"ADME"
],
"Publication": "https://pubmed.ncbi.nlm.nih.gov/31176566/",
"Source Code": "https://github.com/ncats/ncats-adme",
"License": "None"
}
Hi @pauline-banye
Please revise what you have in the tags with the documentation in gitbook or in the metadata files in the Ersilia Hub. As you know, python strings must be LITERALLY the same - including CAPS
I updated the metadata.json. Unfortunately the checks are still failing.
{
"Identifier": "eos74bo",
"Slug": "aqueous-kinetic-solubility",
"Status": "In progress",
"Title": "Aqueous Kinetic Solubility",
"Description": "Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.",
"Mode": "Pretrained",
"Task": ["Classification"],
"Input": ["Compound"],
"Input Shape": "Single",
"Output": ["Probability"],
"Output Type": ["Float"],
"Output Shape": "Single",
"Interpretation": "Probability of a compound being soluble at 10 μg/mL. (>0.5: Soluble), and probability of a compound being highly soluble (>52 μg/mL; >0.5: Soluble)",
"Tag": [
"ADME",
"Solubility"
],
"Publication": "https://slas-discovery.org/article/S2472-5552(22)06765-X/fulltext",
"Source Code": "https://github.com/ncats/ncats-adme",
"License": "None"
}
Hello @pauline-banye !
Where are you updatingt he metadata file? Git Actions checks for the files in your fork of the repository, you can see the link on the Action file: 18:10:47 | DEBUG | Reading from https://raw.githubusercontent.com/pauline-banye/eos74bo/main/metadata.json
Make sure you update that specific metadata.json, since it is still showing the "solubility" and "ADME" tags - this is the last action that was run: https://github.com/ersilia-os/eos74bo/actions/runs/4157600807/jobs/7192215071
Hi @Femme-js
The metadata.json is still incomplete, please fill in the interpretation and check the licenses format allowed. I've modified the test_model_pr workflow to the latest version so it will be triggered
Hi @GemmaTuron, thank you so much for your help. I eventually resolved it by making the PR directly from the main branch.
I was initially tracking the actions on my forked repository using the dev and main branch until I remembered that you mentioned that it works only on the main branch. So I merged the updated codes to my main branch, which triggered the actions on the Ersilia repo.
Good morning @GemmaTuron, The issue still persists with the checks, I have outlined the dependencies in the environment.yml file, the dependencies and versions in the activated environment below.
Python 3.8.16
conda env export
to extract the dependencies in the activated conda environment.During our catch up meeting yesterday, @miquelduranfrigola mentioned that this error is due to the docker image not having those dependencies and He asked that I switch the python version to Python 3.8 and include the dependencies. I updated the dockerfile but the checks still fails.
RUN apt-get update && \ apt-get install -y software-properties-common && \ add-apt-repository -y ppa:deadsnakes/ppa && \ apt-get update && \ apt install -y python3.8
RUN pip install rdkit RUN pip install pandas RUN pip install numpy RUN pip install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html
RUN pip install torch
RUN pip install FPSim2 RUN pip install tqdm RUN pip install typing-extensions RUN pip install typed-argument-parser RUN pip install tensorboardX RUN pip install scikit-learn RUN pip install hyperopt RUN pip install requests
WORKDIR /repo COPY . /repo