ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
219 stars 147 forks source link

๐Ÿฆ  Model Request: Incorporating DrugTax into Ersilia Model Hub #503

Closed Femme-js closed 1 year ago

Femme-js commented 1 year ago

Model Name

DrugTax

Model Description

DrugTax is a python package for drug taxonomy identification and explainable feature extraction

Slug

drugtax

Tags

taxonomy classification, bulk analysis

Publication

https://jcheminf.biomedcentral.com/articles/10.1186/s13321-022-00649-w

Code

License

GNU General Public License v3.0

https://github.com/MoreiraLAB/DrugTax/blob/main/LICENSE

Femme-js commented 1 year ago

DrugTax leverages small molecule representations as input in the form of SMILES.

The package allows extraction of taxonomy information and key features of molecules for detailed characterization. It also allows to leverage the visualization and bulk analysis of molecules for chemical space representation and molecule similarity assessment.

DrugTax provides the prior classification between the two possible kingdoms, organic and inorganic, and, respectively, their 26 and 5 superclasses.

This package could be applied to generate similarity searches, chemical space visualization, clustering, taxonomy-property relationships, among others. The results could then be combined with different easy-to-implement visualization tools.

This package comes with very few dependencies. Most of its extended dependencies emerge when using the bulk analysis and plotting options.

Femme-js commented 1 year ago

https://colab.research.google.com/drive/1WMTqL2YLxyY3baa-OxnwhNG0caTpOatj?usp=sharing

This is the colab link to get started with DrugTax.

DrugTax package has a drug tax class to extract taxonomy information for a smile and 163 features (simple and explainable). If one wants to get taxonomy information for bulk data with input in the form of a CSV file or drug list or smile list, there is 'retrieve_taxonomic_class' to use.

Femme-js commented 1 year ago

@miquelduranfrigola Can you take a look please?

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! ๐ŸŽ‰

@Femme-js ersilia model respository has been successfully created and is available at:

๐Ÿ”— ersilia-os/eos24ci

Next Steps โญ

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources ๐Ÿ“š

If you have any questions, please feel free to open an issue and get support from the community!

Femme-js commented 1 year ago

Hi @GemmaTuron and @miquelduranfrigola !

The main.py code is working for small molecules.

I am attaching the sample output file from the code. output.csv output.csv

Femme-js commented 1 year ago

I tried inputting the smiles in eml_canonical.csv as single smile input, but strangely bulk analysis is giving the error when the all the inputs are in a single list from eml_canonical.csv. Screenshot from 2022-12-23 03-40-01

GemmaTuron commented 1 year ago

Hi @Femme-js ,

As we just discussed:

GemmaTuron commented 1 year ago

Hi @Femme-js Can you provide an update of the model status and what did you find out about the smiles issue?

Femme-js commented 1 year ago

Timeline of incorporating this model:

While incorporating this model outside ersilia, I tested my code over the eml_canonical.csv file (standard inputs of SMILES, provided by ersilia during the contribution period) into CSV format, smiles_list, and drug_list. While testing it out, I did encounter the above-posted error with the inputs in standard SMILE format. After debugging it through above discussed points with @GemmaTuron, I found out that drugtax module does not parse the smiles into the aromatic format, and needs to be converted into Kekule format. For this, I used rdkit package to convert the aromatic smiles input into Kekule input.

Below is the description of Kekule and the Aromatic format of SMILES :

Screenshot from 2023-01-11 02-37-11

Current Status of the Model:

The PR for this model has already been merged and is ready to test.

Femme-js commented 1 year ago

I tried and tested this model on CLI but it fails to fetch.

eos24ci.log

miquelduranfrigola commented 1 year ago

Thanks @Femme-js . This seems to be related to a conda installation error.

I've tried to fetch the model both in my local computer and in a github actions workflow. I found errors as well, but conda installation worked.

I haven't solved the model yet, but please check some edits I've done: https://github.com/ersilia-os/eos24ci/commit/359fa36bfd596c48d67ce41f7178c7fe3fceffbd

miquelduranfrigola commented 1 year ago

Hi @Femme-js the model was fetched successfully in my device after a few changes I've made. Please inspect them.

Before closing the issue, let's:

Many thanks!

GemmaTuron commented 1 year ago

Hi @Femme-js

Can I close this issue?

Femme-js commented 1 year ago

Yes @GemmaTuron !

Femme-js commented 1 year ago

Hi @Femme-js, the model was fetched successfully in my device after a few changes I've made. Please inspect them.

Before closing the issue, let's:

  • [x] Try to have you model working on your device.
  • [x] Complete the README.md file.

Many thanks!

Hi @miquelduranfrigola !

I have been able to successfully test the model on my CLI. I am attaching the log file here. eos24ci3.log