ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
225 stars 146 forks source link

🦠 Model Request: SmilesPE #754

Closed russelljeffrey closed 10 months ago

russelljeffrey commented 1 year ago

Model Name

SmilesPE: tokenizer algorithm for SMILES, DeepSMILES, and SELFIES

Model Description

The Smiles Pair Encoding method generates smiles substring tokens based on high-frequency token pairs from large chemical datasets. This method is well-suited for both QSAR activities as well as generative models. The model provided here has been pretrained using ChEMBL.

Slug

smiles-pe

Tag

Chemical language model, Chemical notation, ChEMBL

Publication

https://pubs.acs.org/doi/abs/10.1021/acs.jcim.0c01127

Source Code

https://github.com/XinhaoLi74/SmilesPE

License

Apache-2.0

GemmaTuron commented 1 year ago

Hi @russelljeffrey

As you might have read in our documentation, the TAGS are predetermined so you need to abide by them, otherwise the model cannot be approved as the checks will fail. Please check them out and modify accordingly before we can proceed.

russelljeffrey commented 1 year ago

Hello @GemmaTuron. Thank you for mentioning that. I have read the docs before and as shown here in an example, the tag is written based on the name of the model (As I understood). I read most parts of the Ersilia gitbook that is in regards to model contribution. Can you please provide a link (if any available) so that I can understand how they are predefined? Thank you.

GemmaTuron commented 1 year ago

Hello @russelljeffrey

The Slug is the model short name, not the tag. The one you suggest feels too long, so I would only write smiles-tokenizer All the information related to how to fill in these fields can be found in the Ersilia GitBook under model contribution

russelljeffrey commented 1 year ago

Thank you @GemmaTuron for your explanations. All of the required modification will be done very soon.

russelljeffrey commented 1 year ago

Hi again @GemmaTuron . The slug and tag have been modified according to the link you provided.

GemmaTuron commented 1 year ago

Hello @russelljeffrey The slug is the short name and the tags are the key words that will help identify the model, and they must be from the list provided in the above link, please have a look and suggest a few that might fit.

russelljeffrey commented 1 year ago

Hi @GemmaTuron , The tag and Slug were corrected.

GemmaTuron commented 1 year ago

I have updated the fields according to the guidelines in the documentation (i.e, description must be at least 200 characters, and slug is should not have caps). I have removed the hERG Tag, why did you choose it @russelljeffrey? Before starting the model incorporation, can you confirm that you are able to run Ersilia models in your system?

Thanks

russelljeffrey commented 1 year ago

Hi @GemmaTuron . The reason I chose hERG in the tag was that authors used hERG in the article as one of the benchmark datasets, and thank you for updating the discriptions. Yes ersilia can fully function on my system since I use a dedicated GitHub codespace.

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@russelljeffrey ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos1mxi

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

GemmaTuron commented 1 year ago

Hi @russelljeffrey

Are you still active on this model? Let me know the status and add here if you have any comments

russelljeffrey commented 1 year ago

Hello @GemmaTuron . Yes I'm still active and I believe the implementation will be finished until the end of this week. The implementation of tokenizer is complete but testing is still needed. So I will soon test the model to make sure it's ready to be used.

GemmaTuron commented 10 months ago

This model is completed!