paulinebanye commented 1 year ago

Good morning @GemmaTuron, The issue still persists with the checks, I have outlined the dependencies in the environment.yml file, the dependencies and versions in the activated environment below.

[x] Python version: Python 3.8.16
[x] Content of the environment.yml.

name: eos74bo
channels:
  - conda-forge
  - defaults
dependencies:
  - python=3.8
  - pip=20.0
  - pip:
    - numpy
    - pandas
    - rdkit
    - torch
    - FPSim2
    - tqdm
    - typing-extensions
    - typed-argument-parser
    - tensorboardX
    - scikit-learn
    - hyperopt
    - requests

[x] Output returned when I ran the command conda env export to extract the dependencies in the activated conda environment.

name: /mnt/c/Users/DELL-PC/Desktop/eos/eos74bo/eos74bo
channels:
- conda-forge
- defaults
dependencies:
- _libgcc_mutex=0.1=conda_forge
- _openmp_mutex=4.5=2_gnu
- bzip2=1.0.8=h7f98852_4
- ca-certificates=2022.12.7=ha878542_0
- ld_impl_linux-64=2.40=h41732ed_0
- libffi=3.4.2=h7f98852_5
- libgcc-ng=12.2.0=h65d4601_19
- libgomp=12.2.0=h65d4601_19
- libnsl=2.0.0=h7f98852_0
- libsqlite=3.40.0=h753d276_0
- libuuid=2.32.1=h7f98852_1000
- libzlib=1.2.13=h166bdaf_4
- ncurses=6.3=h27087fc_1
- openssl=3.0.8=h0b41bf4_0
- pip=20.0.2=py_2
- python=3.8.16=he550d4f_1_cpython
- readline=8.1.2=h0f457ee_0
- setuptools=67.1.0=pyhd8ed1ab_0
- tk=8.6.12=h27826a3_0
- wheel=0.38.4=pyhd8ed1ab_0
- xz=5.2.6=h166bdaf_0
- pip:
  - blosc2==2.0.0
  - certifi==2022.12.7
  - charset-normalizer==3.0.1
  - cloudpickle==2.2.1
  - cython==0.29.33
  - fpsim2==0.4.2
  - future==0.18.3
  - greenlet==2.0.2
  - hyperopt==0.2.7
  - idna==3.4
  - joblib==1.2.0
  - msgpack==1.0.4
  - mypy-extensions==1.0.0
  - networkx==3.0
  - numexpr==2.8.4
  - numpy==1.24.2
  - nvidia-cublas-cu11==11.10.3.66
  - nvidia-cuda-nvrtc-cu11==11.7.99
  - nvidia-cuda-runtime-cu11==11.7.99
  - nvidia-cudnn-cu11==8.5.0.96
  - packaging==23.0
  - pandas==1.5.3
  - pillow==9.4.0
  - protobuf==3.20.1
  - py-cpuinfo==9.0.0
  - py4j==0.10.9.7
  - python-dateutil==2.8.2
  - pytz==2022.7.1
  - rdkit==2022.9.4
  - requests==2.28.2
  - scikit-learn==1.2.1
  - scipy==1.10.0
  - six==1.16.0
  - sqlalchemy==2.0.2
  - tables==3.8.0
  - tensorboardx==2.5.1
  - threadpoolctl==3.1.0
  - torch==1.13.1
  - tqdm==4.64.1
  - typed-argument-parser==1.7.2
  - typing-extensions==4.4.0
  - typing-inspect==0.8.0
  - urllib3==1.26.14
prefix: /mnt/c/Users/DELL-PC/Desktop/eos/eos74bo/eos74bo

During our catch up meeting yesterday, @miquelduranfrigola mentioned that this error is due to the docker image not having those dependencies and He asked that I switch the python version to Python 3.8 and include the dependencies. I updated the dockerfile but the checks still fails.

[x] This is the current Dockerfile


FROM bentoml/model-server:0.11.0-py37
MAINTAINER ersilia

RUN apt-get update && \ apt-get install -y software-properties-common && \ add-apt-repository -y ppa:deadsnakes/ppa && \ apt-get update && \ apt install -y python3.8

RUN pip install rdkit RUN pip install pandas RUN pip install numpy RUN pip install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

RUN pip install torch

RUN pip install FPSim2 RUN pip install tqdm RUN pip install typing-extensions RUN pip install typed-argument-parser RUN pip install tensorboardX RUN pip install scikit-learn RUN pip install hyperopt RUN pip install requests

WORKDIR /repo COPY . /repo

GemmaTuron commented 1 year ago

Hi @pauline-banye

Thanks, very detailed answer super helpful Just to be clear: the check fails in Git Actions, but if you run it in your computer with the --repo-path flag, does it work?

paulinebanye commented 1 year ago

Hi @GemmaTuron, No it does not. I get errors when I run it with the repopath flag. `ersilia -v fetch eos74bo -r /mnt/c/Users/DELL-PC/Desktop/eos74bo/ > eos74bo.log 2>&1`.

Steps to recreate the error

[x] First I removed the function that was dependent on FPSim2
[x] Recreated the environment without FPSim2 and confirmed that it functioned without errors in my local environment.
[x] Next I tested the repo within the Ersilia CLI with & without the docker dependencies.
- Without dependencies in docker eos74bo_.log
- With dependencies in docker eos74bo_1.log

paulinebanye commented 1 year ago

Update

@GemmaTuron @DhanshreeA The build progressed much further with docker dependencies. The error returned currently is related to the relative imports path within the repo. This is an issue that occured when I was working with the RLM model initially, but it was resolved by including sys.path.append or sys.path.insert within the codebase. I added those as soon as I began working with the solubility model.

[x] Content of the Dockerfile


FROM bentoml/model-server:0.11.0-py37
MAINTAINER ersilia

RUN pip install rdkit RUN pip install pandas RUN pip install numpy RUN pip install torch==1.6.0+cpu -f https://download.pytorch.org/whl/torch_stable.html

RUN pip install torch

RUN pip install tqdm RUN pip install typing-extensions RUN pip install typed-argument-parser RUN pip install tensorboardX RUN pip install scikit-learn RUN pip install hyperopt RUN pip install requests

WORKDIR /repo COPY . /repo



I am at a loss to understand why it is not working when I test the model within the Ersilia CLI. I am still attempting to debug and figure out what is causing the errors.
[eos74bo_6.log](https://github.com/ersilia-os/eos74bo/files/10712735/eos74bo_6.log)
[eos74bo_8.log](https://github.com/ersilia-os/eos74bo/files/10712736/eos74bo_8.log)

carcablop commented 1 year ago

Hello @pauline-banye . I see in the logs that some modules are not being imported, to do this make sure from your development environment in the code that the imports are being done correctly, that this is the correct path, check that it really is in the folder that corresponds to those . py, for example "predictors", it seems that it is trying to access that folder to import "Solubility predictor", but it cannot find it. Check if you have a __init__.py file and if you have those imports configured there.

And try to run the main.py file, as follows: If you already have an environment configured with the model dependencies installed. Activate the environment and go directly to the main.py file path and run it. When you have solved it you can finally run it with ersilia and repo_path like this: activate the ersilia environment and run: ersilia -v fetch "model_id" --repo_path /home/../model_id

I recommend as a good practice, in the docker file add the versions of each dependency.

paulinebanye commented 1 year ago

Hello @pauline-banye . I see in the logs that some modules are not being imported, to do this make sure from your development environment in the code that the imports are being done correctly, that this is the correct path, check that it really is in the folder that corresponds to those . py, for example "predictors", it seems that it is trying to access that folder to import "Solubility predictor", but it cannot find it. Check if you have a __init__.py file and if you have those imports configured there.

And try to run the main.py file, as follows: If you already have an environment configured with the model dependencies installed. Activate the environment and go directly to the main.py file path and run it. When you have solved it you can finally run it with ersilia and repo_path like this: activate the ersilia environment and run: ersilia -v fetch "model_id" --repo_path /home/../model_id

I recommend as a good practice, in the docker file add the versions of each dependency.

Thank you so much @carcablop. I really appreciate your help checking the model out.

paulinebanye commented 1 year ago

Update

Hi @GemmaTuron, The issues I kept receiving were due to the relative paths. Majority of them were within the chemprop submodule and I have been able to resolve them. I tested the repo and the model was fetched successfully.

ersilia -v fetch eos74bo -r /mnt/c/Users/DELL-PC/Desktop/nu_eos74bo/eos74bo > eos74bo.log 2>&1 eos74bo.log

DhanshreeA commented 1 year ago

That sounds great @pauline-banye ! Could you also describe the issues you faced with chemprop, we can all learn from it and this can be useful information around model incorporation for future contributors.

GemmaTuron commented 1 year ago

Hi @pauline-banye ! Great progress thanks for the detailed feedback. Could you:

[ ] Explain which function required FPSim2
[ ] Summarize Chemprop issues as Dhanshree is suggesting

paulinebanye commented 1 year ago

Sure @GemmaTuron @DhanshreeA, @carcablop !

Resolving chemprop issues

The issues were mainly related to the way the relative imports in the chemprop submodule were specified in the repository. I kept getting different errors , which prompted me to start debugging each file.

[x] In some instances I had to use from .. import <module name> to reference the parent directory of that specific module and retrieve specific functions or classes. For example, the model/framework/predictors/chemprop/chemprop/data/data.py file required some classes from a different folder, and the way it was imported returned a module not found error.

I ended up converting the import from chemprop.features import BatchMolGraph, MolGraph to from .. features import BatchMolGraph, MolGraph. This instructed the code to cd out of the data directory into the model/framework/predictors/chemprop/chemprop/features/init.py file and import the required classes.

[x] In instances where I had a directory with different modules, I ended up refrencing each module with from . import <module name> to extract specific functions. For example in the _predictors/chemprop/chemprop/interpret.py, there were several imports which were referenced in a way that returned the module not found error.

from chemprop.args import InterpretArgs
from chemprop.data import MoleculeDataLoader, MoleculeDataset
from chemprop.data.utils import get_data_from_smiles, get_header, get_smiles
from chemprop.train import predict
from chemprop.utils import load_args, load_checkpoint, load_scalers

To fix those errors, I edited the imports using the . notation


from . args import InterpretArgs
from . data import MoleculeDataLoader, MoleculeDataset
from . data.utils import get_data_from_smiles, get_header, get_smiles
from . train import predict
from . utils import load_args, load_checkpoint, load_scalers
.....

[x] In other instances, I made use of the sys.path.append or sys.path.insert option. In the main.py file for example, I needed to import SolubilityPredictor from the solubility_predictor in the solubility directory.
```
root = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(root, ".."))

from predictors.solubility.solubility_predictor import SolubilityPredictor 
```
- First I used root = os.path.dirname(os.path.abspath(__file__)) to set the absolute path of the code directory containing the main.py file.
- Then I appended the .. which instructed the code to cd out of the code folder before specifying the relative path (predictors.solubility.solubility_predictor) to the SolubilityPredictor class.

paulinebanye commented 1 year ago

Function which reqired FPSim2

[x] The function which required FPSim2 was located in the utilities/utilities.py file. From my understanding, this code takes the list of kekule_smiles and the model, then calculates the Tanimoto similarity between the input smiles and returns a list of similarity values and the time taken to complete the similarity calculation.

def get_similar_mols(kekule_smiles: list, model: str):
  start = time.time()
  sim_vals = []
  fp_dict_path = ''.join(['../train_data/', model, '.h5'])
  fp_dict_path = path.abspath(path.join(os.getcwd(), fp_dict_path))
  fp_engine = FPSim2Engine(fp_dict_path)
  for smi in kekule_smiles:
      res = fp_engine.on_disk_similarity(smi, 0.01)
      sim_vals.append(res[0][1])
  end = time.time()
  print(f'{end - start} seconds to calculate Tanimoto similarity for {len(kekule_smiles)} molecules')
  return sim_vals

paulinebanye commented 1 year ago

Update

Hi @GemmaTuron

I tested the model via the repo_path method.

[x] The fetched, served and ran successfully on the CLI.
[X] The model was tested using a list of smiles and a single smile string eos74bo.log eos74bo_run.csv eos74bo_list_run.csv

Issue

Although the model functioned as expected, I am experiencing the issue with the repo metadata. It returns a Wrong Ersilia model tag error. However I did not edit any of the items in the tag variable.. I have been reviewing the metadata but I haven't been able to figure out what could be causing this error.

@DhanshreeA @carcablop would you mind taking a look at it?

--2023-02-12 18:10:45--  https://raw.githubusercontent.com/ersilia-os/ersilia/master/.github/scripts/update_metadata_to_airtable.py
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.111.133, 185.199.108.133, 185.199.109.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.111.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 536 [text/plain]
Saving to: ‘update_metadata_to_airtable.py’

     0K                                                       100% 40.6M=0s

2023-02-12 18:10:46 (40.6 MB/s) - ‘update_metadata_to_airtable.py’ saved [536/536]

18:10:47 | DEBUG    | Reading from https://raw.githubusercontent.com/pauline-banye/eos74bo/main/metadata.json
18:10:47 | ERROR    | Ersilia exception class:
TagBaseInformationError

Detailed error:
Wrong Ersilia model tag

Hints:
Tags must be in list format and they must be accepted our team. This means that only tags that are already available in Ersilia are allowed. If you want to include a new tag, please open a pull request (PR) on the 'tag.txt' file from the Ersilia repository.

Traceback (most recent call last):
  File "/home/runner/work/eos74bo/eos74bo/update_metadata_to_airtable.py", line 14, in <module>
    data = rm.read_information(org=user_name, branch=branch)
  File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 388, in read_information
    bi.from_dict(data)
  File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 342, in from_dict
    self.tag = data["Tag"]
  File "/usr/share/miniconda/lib/python3.10/site-packages/ersilia/hub/content/card.py", line 238, in tag
    raise TagBaseInformationError
ersilia.utils.exceptions_utils.card_exceptions.TagBaseInformationError: Ersilia exception class:
TagBaseInformationError

Detailed error:
Wrong Ersilia model tag

Hints:
Tags must be in list format and they must be accepted our team. This means that only tags that are already available in Ersilia are allowed. If you want to include a new tag, please open a pull request (PR) on the 'tag.txt' file from the Ersilia repository.

Error: Process completed with exit code 1.

[x] My metadata.json file.

{    
"Identifier": "eos74bo",
"Slug": "aqueous-kinetic-solubility",
"Status": "In progress",
"Title": "Aqueous Kinetic Solubility",
"Description": "Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.",
"Mode": "Pretrained",
"Task": ["Classification"],
"Input": ["Compound"],
"Input Shape": "Single",
"Output": ["Probability"],
"Output Type": ["Float"],
"Output Shape": "Single",
"Interpretation": "Probability of a compound being soluble at 10 μg/mL. (>0.5: Soluble), and probability of a compound being highly soluble (>52 μg/mL; >0.5: Soluble)",
"Tag": [
    "solubility",
    "ADME"
],
"Publication": "https://pubmed.ncbi.nlm.nih.gov/31176566/",
"Source Code": "https://github.com/ncats/ncats-adme",
"License": "None"
}

GemmaTuron commented 1 year ago

Hi @pauline-banye

Please revise what you have in the tags with the documentation in gitbook or in the metadata files in the Ersilia Hub. As you know, python strings must be LITERALLY the same - including CAPS

paulinebanye commented 1 year ago

I updated the metadata.json. Unfortunately the checks are still failing.

{    
    "Identifier": "eos74bo",
    "Slug": "aqueous-kinetic-solubility",
    "Status": "In progress",
    "Title": "Aqueous Kinetic Solubility",
    "Description": "Prediction of Aqueous solubility is one of the most important properties in drug discovery, as it has profound impact on various drug properties, including biological activity, pharmacokinetics (PK), toxicity, and in vivo efficacy.",
    "Mode": "Pretrained",
    "Task": ["Classification"],
    "Input": ["Compound"],
    "Input Shape": "Single",
    "Output": ["Probability"],
    "Output Type": ["Float"],
    "Output Shape": "Single",
    "Interpretation": "Probability of a compound being soluble at 10 μg/mL. (>0.5: Soluble), and probability of a compound being highly soluble (>52 μg/mL; >0.5: Soluble)",
    "Tag": [
        "ADME",
        "Solubility"
    ],
    "Publication": "https://slas-discovery.org/article/S2472-5552(22)06765-X/fulltext",
    "Source Code": "https://github.com/ncats/ncats-adme",
    "License": "None"
}

GemmaTuron commented 1 year ago

Hello @pauline-banye !

Where are you updatingt he metadata file? Git Actions checks for the files in your fork of the repository, you can see the link on the Action file: 18:10:47 | DEBUG | Reading from https://raw.githubusercontent.com/pauline-banye/eos74bo/main/metadata.json

Make sure you update that specific metadata.json, since it is still showing the "solubility" and "ADME" tags - this is the last action that was run: https://github.com/ersilia-os/eos74bo/actions/runs/4157600807/jobs/7192215071

GemmaTuron commented 1 year ago

Hi @Femme-js

The metadata.json is still incomplete, please fill in the interpretation and check the licenses format allowed. I've modified the test_model_pr workflow to the latest version so it will be triggered

paulinebanye commented 1 year ago

Hi @GemmaTuron, thank you so much for your help. I eventually resolved it by making the PR directly from the main branch.

I was initially tracking the actions on my forked repository using the dev and main branch until I remembered that you mentioned that it works only on the main branch. So I merged the updated codes to my main branch, which triggered the actions on the Ersilia repo.

ersilia-os / eos74bo

List of Dependencies #13

RUN pip install torch

Update

RUN pip install torch

Update

Resolving chemprop issues

Function which reqired FPSim2

Update

Issue