ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
219 stars 147 forks source link

🦠 Model Request: Convert SMILES into their Canonical form using Datamol python library #564

Closed carcablop closed 1 year ago

carcablop commented 1 year ago

Model Name

Converter of SMILES in Canonical form

Model Description

Functionality model combined with other python libraries like DATAMOL to convert SMILES into their canonical form. It is intended in this model to implement the Datamol preprocessing as an individual model.

Slug

datamol-smiles2canonical

Tag

datamol, standardize_smiles

Publication

https://doc.datamol.io/stable/tutorials/Preprocessing.html

Source code

https://github.com/datamol-org/datamol

License

Under the Apache-2.0 license

carcablop commented 1 year ago

Hello @GemmaTuron I find that to use this DATAMOL python library is only compatible with these python versions: 3.8, 3.9, 3.10 imagen

Although, I try to install a conda environment with python 3.7 and install datamol. When I try to run the python script to get the canonicals smiles. I am encountering the following error. This error tells me that it cannot import "Literal" from typing. Literal_issue.txt

Searching the internet, this is only solved by updating the python version to 3.8,.
https://github.com/python/typing/issues/707

I would like to try what they propose in the previous link to be able to use "literal" in previous versions of python. pip3 install typing_extensions and then use from typing_extensions import Literal.

GemmaTuron commented 1 year ago

Thanks @carcablop good catch! Good news is that @miquelduranfrigola managed to eliminate the requirement for ersilia py3.8, so it should not be a problem from now on. @miquelduranfrigola what do you say?

carcablop commented 1 year ago

Hello @GemmaTuron . Such good news!. It's great. In today's meet I asked Miquel how I could do it, and he showed me an option from the docker file, to run the model with the python version 3.8. Outside of ersilia I tried the script that I generated with python 3.8 and this doesn't give me any problems.

carcablop commented 1 year ago

@GemmaTuron, one question: Does Ersilia only want to get the smiles in their canonical form?. With Datamol library you can also get the selfies, inchi and inchi key. If it is of interest to obtain the selfies, can it be done on the same model? or independent models?

GemmaTuron commented 1 year ago

@GemmaTuron, one question: Does Ersilia only want to get the smiles in their canonical form?. With Datamol library you can also get the selfies, inchi and inchi key. If it is of interest to obtain the selfies, can it be done on the same model? or independent models?

Maybe having everything in one model is useful indeed! @miquelduranfrigola ?

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@carcablop ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos7qga

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

carcablop commented 1 year ago

Hello @GemmaTuron. This model would be ready to be incorporated into the Ersilia model hub. Finally, it seemed useful to have as an output in this model the other forms of representation of a smile such as selfies, inchi and inchi key and not only the canonical form of the smile. That's why the shape of the output is multiple columns with canonical smile, selfie, inchi and inchi keys. I attach the output file passing as input the Ersilia "eml_canonical.csv" file. Fetch and serve within Ersilia. output_eos7qga.csv This is the log fetch: log_fetch_eos7qga.txt This run with python 3.8 If you don't want these kinds of representations of a smile, please let me know, and I'll change it. In the "Mode" option of the metadata.json file, I'm not sure what to add, since it is neither pertained nor retrained. Only the datamol library is imported and I call the functions I need. What other option could you suggest as a "mode" option that Ersilia accepts, before requesting the pull request? This is the link to my repository. https://github.com/carcablop/eos7qga/commit/149eaf1637a0859cb70dfb91eecef8458de8b438. I only need to add the "mode" option in the metadata.json. Thank you very much.

GemmaTuron commented 1 year ago

Thanks Carolina for the detailed answer, this looks fantastic. The Mode would be "Pretrained" I think, since we have done no work on it, even though its not exactly a "model" - add this and to a PR to the repo :)