ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
219 stars 147 forks source link

🦠 Model Request: Finetuned ImageMol for SARS-Cov2 assays from NCATS OpenData #572

Closed DhanshreeA closed 1 year ago

DhanshreeA commented 1 year ago

Model Name

SARS-CoV-2 Anti viral screening

Model Description

Representation Learning Framework that utilizes molecule images for encoding molecular inputs as machine readable vectors for downstream tasks such as bio-activity prediction, drug metabolism analysis, or drug toxicity prediction. The approach utilizes transfer learning, that is, pre-training the model on massive unlabeled datasets to help it in generalizing feature extraction and then fine tuning on specific tasks. This model will be fine tuned on 13 assays concerned with a number of target categories ranging from viral entry to toxicity in humans. These interactions are formulated as binary classification tasks

Slug

sars-cov-2-antiviral-screen

Tag

SarsCov2,classification

Publication

https://www.nature.com/articles/s42256-022-00557-6

Source code

https://github.com/HongxinXiang/ImageMol

License

MIT License

DhanshreeA commented 1 year ago

The data statistics for SARS-CoV-2 assays are as follows: image

GemmaTuron commented 1 year ago

/approve

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@DhanshreeA ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos4cxk

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

DhanshreeA commented 1 year ago

Waiting for the authors to get back with fine tuned models for these assays as reproducing the results from the paper has been challenging because the exact choice of hyper parameters is not known, and takes some experimentation when considering different possible values from the range of values the authors have provided in their supplementary materials.

DhanshreeA commented 1 year ago

Hi @GemmaTuron could you please add me to this repository? the authors have been super helpful and have shared the model checkpoints with us, and since this repo is old, it doesn't have the mock.csv tracked in LFS. Alternatively, you could just update the mock.csv in this repo.

GemmaTuron commented 1 year ago

Great to hear that. Also added #571 , you should be good to edit both! If it is easier, we can also approve a new repository which will have all the checks on it, depending on how much you already worked in the previous repo that might be the most straightforward, let me know

DhanshreeA commented 1 year ago

I have completed both 😬 and need to only commit the code, so I guess we can just use these two repos.

GemmaTuron commented 1 year ago

cool, if you can copy the latest workflow files in your fork, then they will be merged and triggered in the future prs, automatically updating airtable etc, would be super helpful!

DhanshreeA commented 1 year ago

Cool, I'll do that!

On Mon, Feb 13, 2023, 8:41 PM gemmaturon @.***> wrote:

cool, if you can copy the latest workflow files in your fork, then they will be merged and triggered in the future prs, automatically updating airtable etc, would be super helpful!

— Reply to this email directly, view it on GitHub https://github.com/ersilia-os/ersilia/issues/572#issuecomment-1428109101, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABXBAUF6JJ2WC2WGAFCAFTDWXJFL3ANCNFSM6AAAAAAUBK6LTU . You are receiving this because you were mentioned.Message ID: @.***>

DhanshreeA commented 1 year ago

Hi @GemmaTuron, quick question: In the interpretation field, I think it will be good to describe what each of the 13 assays is according to the data description above. What do you think? Same for the gpcr model.

GemmaTuron commented 1 year ago

Hi @DhanshreeA !

Maybe just list the assay names and mention each column is one of the results? Something that is quick but gives people a good sense of what they are looking at!

DhanshreeA commented 1 year ago

Associated PR: https://github.com/ersilia-os/eos4cxk/pull/1 Model is ready to be tested by other volunteers.