ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
221 stars 148 forks source link

🦠 Model Request: Cardiotoxicity Classifier #1174

Closed kurysauce closed 2 months ago

kurysauce commented 4 months ago

Model Name

Cardiotoxicity Classifier

Model Description

Prediction of drug-induced cardiotoxicity as a binary classification of cardiotoxicity risk along with a probability score depicting the confidence level of the prediction. Classification is based on the chemical data such as SMILES representations of compounds and a variety of descriptors such as Morgan fingerprints and Mordred physicochemical descriptors that describe the molecular structure of the drug interactions. Biological data is also used including gene expression and cellular paintings after drug interactions. The DICTrank (Drug-Induced Cardiotoxicity Rank) dataset provides the ground truth labels for the training data.

Slug

cardiotox-dictrank

Tag

Cardiotoxicity, DrugBank

Publication

https://doi.org/10.1021/acs.jcim.3c01834

Source Code

https://github.com/srijitseal/DICTrank

License

None

GemmaTuron commented 4 months ago

Hi @kurysauce

Good start, two minor changes before we approve the model:

kurysauce commented 4 months ago

Hi @kurysauce

Good start, two minor changes before we approve the model:

  • Slug: if there is a name or dataset very specific to the model we prefer to use that. In this case, do you think we could use cardiotox-dictrank as slug for example?
  • If there is no license, you need to state None Once these changes are made we can approve the model

Resolved!

GemmaTuron commented 4 months ago

/approve

github-actions[bot] commented 4 months ago

New Model Repository Created! 🎉

@kurysauce ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos1pu1

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

GemmaTuron commented 4 months ago

Message from @kurysauce on Slack or anyone available. Currently stuck on debugging the log output (linked here on Github) after attempting to fetch my model . I’m currently comparing the eos1pu1 virtual env with my local env that is able to run the model, to see if there are any discrepancies that may lead to errors. At face-value, I notice that there are packages from the Dockerfile that have not been installed in the eos1pu1 environment. Additionally, I notice that within the packages that overlap between the two environments, I have 3 version mismatches: numpy eos1pu1 env Version: 1.26.4 local virtual env Version: 1.24.4 pillow eos1pu1 env Version: 10.4.0 local virtual env Version: 10.3.0 python-dateutil eos1pu1 env Version: 2.9.0.post0 local virtual env Version: 2.9.0 Not sure if this is contributing to the log output errors. I’ve attached the package lists below for convenience, any help appreciated! 1) eos1pu1 env: Package Version


alembic 1.13.2 bentoml 0.11.0 blinker 1.8.2 boto3 1.34.138 botocore 1.34.138 Cerberus 1.3.5 certifi 2024.6.2 chardet 5.2.0 charset-normalizer 3.3.2 click 8.1.7 docker 7.1.0 Flask 3.0.3 humanfriendly 10.0 idna 3.7 itsdangerous 2.2.0 Jinja2 3.1.4 jmespath 1.0.1 Mako 1.3.5 MarkupSafe 2.1.5 multidict 6.0.5 numpy 1.26.4 packaging 24.1 pillow 10.4.0 pip 24.0 prometheus_client 0.20.0 protobuf 3.18.3 psutil 6.0.0 python-dateutil 2.9.0.post0 python-json-logger 2.0.7 rdkit 2024.3.3 requests 2.32.3 ruamel.yaml 0.18.6 ruamel.yaml.clib 0.2.8 s3transfer 0.10.2 setuptools 69.5.1 six 1.16.0 SQLAlchemy 2.0.31 SQLAlchemy-Utils 0.41.2 tabulate 0.9.0 typing_extensions 4.12.2 urllib3 2.2.2 Werkzeug 3.0.3 wheel 0.43.0 2) local virtual env : Package Version


anyio 4.4.0 appnope 0.1.4 argon2-cffi 23.1.0 argon2-cffi-bindings 21.2.0 arrow 1.3.0 asttokens 2.4.1 async-lru 2.0.4 attrs 23.2.0 Babel 2.15.0 backcall 0.2.0 beautifulsoup4 4.12.3 bleach 6.1.0 certifi 2024.6.2 cffi 1.16.0 chardet 5.2.0 charset-normalizer 3.3.2 comm 0.2.2 contourpy 1.1.1 cycler 0.12.1 debugpy 1.8.2 decorator 5.1.1 defusedxml 0.7.1 dill 0.3.8 dimorphite_dl 1.3.2 exceptiongroup 1.2.1 executing 2.0.1 fastjsonschema 2.20.0 fonttools 4.53.0 fqdn 1.5.1 freetype-py 2.3.0 greenlet 3.0.3 h11 0.14.0 httpcore 1.0.5 httpx 0.27.0 idna 3.7 importlib_metadata 8.0.0 importlib_resources 6.4.0 ipykernel 6.29.4 ipython 8.12.3 ipywidgets 8.1.3 isoduration 20.11.0 jedi 0.19.1 Jinja2 3.1.4 joblib 1.4.2 json5 0.9.25 jsonpointer 3.0.0 jsonschema 4.22.0 jsonschema-specifications 2023.12.1 jupyter 1.0.0 jupyter_client 8.6.2 jupyter-console 6.6.3 jupyter_core 5.7.2 jupyter-events 0.10.0 jupyter-lsp 2.2.5 jupyter_server 2.14.1 jupyter_server_terminals 0.5.3 jupyterlab 4.2.2 jupyterlab_pygments 0.3.0 jupyterlab_server 2.27.2 jupyterlab_widgets 3.0.11 kiwisolver 1.4.5 MarkupSafe 2.1.5 matplotlib 3.7.3 matplotlib-inline 0.1.7 mistune 3.0.2 mordred 1.2.0 munkres 1.1.4 nbclient 0.10.0 nbconvert 7.16.4 nbformat 5.10.4 nest-asyncio 1.6.0 networkx 2.8.8 notebook 7.2.1 notebook_shim 0.2.4 numpy 1.24.4 overrides 7.7.0 packaging 24.1 pandarallel 1.6.5 pandas 2.0.3 pandocfilters 1.5.1 parso 0.8.4 pexpect 4.9.0 pickleshare 0.7.5 pillow 10.3.0 pip 24.0 pkgutil_resolve_name 1.3.10 platformdirs 4.2.2 prometheus_client 0.20.0 prompt_toolkit 3.0.47 psutil 6.0.0 ptyprocess 0.7.0 pure-eval 0.2.2 pycairo 1.26.1 pycparser 2.22 Pygments 2.18.0 pyparsing 3.1.2 python-dateutil 2.9.0 python-json-logger 2.0.7 pytz 2024.1 PyYAML 6.0.1 pyzmq 26.0.3 qtconsole 5.5.2 QtPy 2.4.1 rdkit 2024.3.3 rdkit-pypi 2022.9.5 referencing 0.35.1 reportlab 4.2.2 requests 2.32.3 rfc3339-validator 0.1.4 rfc3986-validator 0.1.1 rlPyCairo 0.2.0 rpds-py 0.18.1 scikit-learn 1.1.1 scipy 1.10.1 Send2Trash 1.8.3 setuptools 69.5.1 six 1.16.0 sniffio 1.3.1 soupsieve 2.5 SQLAlchemy 2.0.31 stack-data 0.6.3 terminado 0.18.1 threadpoolctl 3.5.0 tinycss2 1.3.0 tomli 2.0.1 tornado 6.4.1 traitlets 5.14.3 types-python-dateutil 2.9.0.20240316 typing_extensions 4.12.2 tzdata 2024.1 unicodedata2 15.1.0 uri-template 1.3.0 urllib3 2.2.2 wcwidth 0.2.13 webcolors 24.6.0 webencodings 0.5.1 websocket-client 1.8.0 wheel 0.43.0 widgetsnbextension 4.0.11 zipp 3.19.2

GemmaTuron commented 4 months ago

Hi @kurysauce

What command are you running to fetch the model? The model is not yet incorporated, so I am assuming you are using the --repo_path flag to run your local code?

The error you are currently getting in the log file is because the metadata.json file is not completed, hence it looks for the Shape of the model without finding it. Regarding package versions, the dockerfile should specify the exact versions of the packages you want to install, I don't see them there if that is the most updated code

If you make sure these things are fixed and provide more info on how are you trying to fetch the model, we can move forward

kurysauce commented 4 months ago

9:37:13 | DEBUG | Metadata needs to be calculated 09:37:13 | ERROR | Meta not available, run some adapations first and it will be inferred atomatically 09:37:13 | DEBUG | These are the results for API run 09:37:13 | DEBUG | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}] 09:37:25 | ERROR | Ersilia exception class: EmptyOutputError

Detailed error: Model API eos1pu1:run did not produce an outputINFO: Pandarallel will run on 8 workers. INFO: Pandarallel will use standard multiprocessing data transfer (pipe) to transfer data between the main process and workers. Traceback (most recent call last): File "/Users/kurtenriquez/eos/repository/eos1pu1/20240708093559_12A277/eos1pu1/artifacts/framework/code/main.py", line 59, in raise ValueError("The CHECKPOINTS_PATH environment variable is not set.") ValueError: The CHECKPOINTS_PATH environment variable is not set.

Hints:

miquelduranfrigola commented 4 months ago
root = os.path.dirname(os.path.abspath(__file__))
checkpoints_dir = os.path.join(root, "..", "..", "checkpoints")
input_output_dir = os.path.join(root, "..")
kurysauce commented 4 months ago
{
    "input": {
        "key": "VQPBIJGXSXEOCU-UHFFFAOYSA-N",
        "input": "COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1",
        "text": "COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1"
    },
    "output": {
        "outcome": [
            null,
            null,
            0.4987850785255432,
            0.0
        ]
    }
}
14:07:42 | DEBUG    | Getting session from /Users/kurtenriquez/eos/session.json
14:07:43 | DEBUG    | Reading card from eos1pu1
14:07:43 | DEBUG    | Reading shape from eos1pu1
14:07:43 | DEBUG    | Input Shape: Single
14:07:43 | DEBUG    | Input type is: compound
14:07:43 | DEBUG    | Input shape is: Single
14:07:43 | DEBUG    | Importing module: .types.compound
14:07:43 | DEBUG    | Checking RDKIT and other requirements necessary for compound inputs
14:07:43 | DEBUG    | InputShapeSingle shape: Single
🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

ExampleGenerator.example() missing 1 required positional argument: 'try_predefined'

Hi @miquelduranfrigola @GemmaTuron . I am having trouble deciphering the error message when testing the model in Ersilia. Additionally, I am not sure why the first two output values are "null". When running the model/code on my local machine, the first two values are "SMILES" and "Standardize_SMILES".

kurysauce commented 4 months ago

Additionally, when I used the input COc1ccc2c(NC(=O)Nc3cccc(C(F)(F)F)n3)ccnc2c1 from the example Ersilia test on my local implementation, my output got different probability values than the Ersilia test output. Not sure how to approach fixing this discrepancy. My guess is that it might have to do with how the original authors wrote the standardise_smile.py file, it might contradict with how Ersilia runs?

GemmaTuron commented 4 months ago

Hi @kurysauce

The output from Ersilia should only be the Probability (or the Probability and the Prediction), but not the SMILES and Standardised_Smiles, as this handled by Ersilia.

I appreciate the work you did on the main.py file, but you modified quite a lot from the basic Ersilia main.py. Can you refactor it so that the structure remains more similar to the eos-template? The input and output are parsed at the top for example, and we do not use the ParseArguments function. While what you did looks good, we need to keep the same structure so it is easier to modify all models at once if needed.

What I would do is create a script with all the functions and just call them on main.py, this will be much cleaner. You could even create a class with all the necessary functions and just import that.

Finally, I suggest you modify run_predictions to output either a list of probabilities or a dictionary with Probabilities and Predictions and then adapt it to write a csv file as specified in this section in the template (which of course will need some modifications)

#check input and output have the same lenght
input_len = len(smiles_list)
output_len = len(outputs)
assert input_len == output_len

# write output in a .csv file
with open(output_file, "w") as f:
    writer = csv.writer(f)
    writer.writerow(["value"])  # header
    for o in outputs:
        writer.writerow([o])
kurysauce commented 4 months ago

Hi, For today I an attempted to fix the main.py script and it is running successfully on my local machine. But, fetching the model on Ersilia failing again with a key error. Attempting to debug:


16:07:44 | DEBUG    | These are the results for API run
16:07:44 | DEBUG    | [{'input': {'key': 'LUHMMHZLDLBAKX-UHFFFAOYSA-N', 'input': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O', 'text': 'CC1C2C(CC3(C=CC(=O)C(=C3C2OC1=O)C)C)O'}, 'output': None}, {'input': {'key': 'QRXWMOHMRWLFEY-UHFFFAOYSA-N', 'input': 'C1=CN=CC=C1C(=O)NN', 'text': 'C1=CN=CC=C1C(=O)NN'}, 'output': None}]
16:07:51 | ERROR    | Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos1pu1:run did not produce an outputTraceback (most recent call last):
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'SMILES'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kurtenriquez/eos/repository/eos1pu1/20240710160625_3977A4/eos1pu1/artifacts/framework/code/main.py", line 21, in <module>
    smiles_list = data['SMILES'].tolist()
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'SMILES'

Hints:
- Visit the fetch troubleshooting site

🚨🚨🚨 Something went wrong with Ersilia 🚨🚨🚨

Error message:

Ersilia exception class:
EmptyOutputError

Detailed error:
Model API eos1pu1:run did not produce an outputTraceback (most recent call last):
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3653, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas/_libs/index.pyx", line 147, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/index.pyx", line 176, in pandas._libs.index.IndexEngine.get_loc
  File "pandas/_libs/hashtable_class_helper.pxi", line 7080, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas/_libs/hashtable_class_helper.pxi", line 7088, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'SMILES'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/kurtenriquez/eos/repository/eos1pu1/20240710160625_3977A4/eos1pu1/artifacts/framework/code/main.py", line 21, in <module>
    smiles_list = data['SMILES'].tolist()
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/frame.py", line 3761, in __getitem__
    indexer = self.columns.get_loc(key)
  File "/Users/kurtenriquez/miniconda3/envs/eos1pu1/lib/python3.10/site-packages/pandas/core/indexes/base.py", line 3655, in get_loc
    raise KeyError(key) from err
KeyError: 'SMILES'

Hints:
- Visit the fetch troubleshooting site```
GemmaTuron commented 4 months ago

Hi @kurysauce If I am not wrong this error is easy to fix. You are looking for a column named "SMILES" that does not exist. I suggest changing this bit of code:

data = pd.read_csv(input_file)
smiles_list = data['SMILES'].tolist()

for this one from the eos-template:

# read SMILES from .csv file, assuming one column with header
with open(input_file, "r") as f:
    reader = csv.reader(f)
    next(reader)  # skip header
    smiles_list = [r[0] for r in reader]
kurysauce commented 2 months ago

@miquelduranfrigola @GemmaTuron , testing completed by contributor @HarmonySosa

miquelduranfrigola commented 2 months ago

Thanks @HarmonySosa and @kurysauce !

GemmaTuron commented 2 months ago

This model works!