ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
215 stars 148 forks source link

✍️ Contribution period: Samted Uche #1017

Closed teddyCodex closed 6 months ago

teddyCodex commented 7 months ago

Week 1 - Get to know the community

Week 2 - Get Familiar with Machine Learning for Chemistry

Week 3 - Validate a Model in the Wild

Week 4 - Prepare your final application

teddyCodex commented 7 months ago

Eos30gr Validation Repo

teddyCodex commented 7 months ago

As a data analyst with developing skills in Python, scientific computing, database management, and web development, I am extremely motivated to contribute to the Ersilia Open Source Initiative through the Outreachy internship program. I find Ersilia's mission to democratize access to AI/ML models for biomedical research and support neglected disease research in low and middle-income countries quite interesting, and I am inspired by Ersilia's commitment to open science. With my focus on financial data, I am quite new to the fields of AI and ML. However, as someone who pivoted from accounting to data analysis, I believe that this uncharted territory for me is well within my reach and I believe that working with Ersilia will give me much-needed exposure to the field of AI and ML.

With my solid foundation in data analysis and Python programming, I believe I have the technical skills to make valid contributions to this project. My experience with scientific computing pipelines and databases would allow me to assist in model integration, infrastructure setup, and data management. Additionally, my web development skills could support the creation of user-friendly interfaces for browsing and deploying models.

I am committed to leveraging my skills and dedication to make a positive impact through this internship and I would be honored to play a role in advancing this mission and vision of egalitarian healthcare for all. Thank you for your consideration.

teddyCodex commented 7 months ago

Notes Certain models are only compatible with linux/amd64 processors while some models are compatible with both linux/amd64 and linux/arm64 processors. Some users may exprience this error when trying to pull non-compatible models:

Using default tag: latest
latest: Pulling from ersiliaos/eos2ta5
no matching manifest for linux/arm64/v8 in the manifest list entries
teddyCodex commented 7 months ago

Selected Model eos30gr for Week 2. I consider its impact very valuable. Also, there is support for linux/arm64 processors. Successfully pulled and tested the model.

DhanshreeA commented 7 months ago

Notes Certain models are only compatible with linux/amd64 processors while some models are compatible with both linux/amd64 and linux/arm64 processors. Some users may exprience this error when trying to pull non-compatible models:

Using default tag: latest
latest: Pulling from ersiliaos/eos2ta5
no matching manifest for linux/arm64/v8 in the manifest list entries

Good to know, we'll keep a track of this!

teddyCodex commented 7 months ago

Using the SMILES dataset provided in the #data channel. Will experiment later using other datasets in the wild.

teddyCodex commented 7 months ago

Encountered error while running ersilia api run -i input.csv -o output.csv Error is as a result of missing isaura installation.

Tried installing isaura using pip but encoutering an error relating to HDF5. Error snippet: Building h5py requires pkg-config unless the HDF5 path is explicitly specified using the environment variable HDF5_DIR. For more information and details, see https://docs.h5py.org/en/stable/build.html#custom-installation error: pkg-config probably not installed: FileNotFoundError(2, 'No such file or directory')

Installing now with brew to see if that resolves the isaura install error.

Installation with brew failed. Installation with pip also failed. Ended up cloning and installing from the isaura github repo. Isaura installed successfully.

teddyCodex commented 7 months ago

Opted to use the Ersilia python library to serve and run the selected model [eos30gr].

Note - Pulls results much faster than the CLI ~90secs

teddyCodex commented 7 months ago

Unknown

The scatter plot provides a visual representation of each molecule's prediction value, with color indicating whether it is a herg blocker or not. The red horizontal line at the 0.8 threshold helps to visually separate predicted here blockers from non-blockers. The scatter plot allows us to see any patterns or clusters in predictions and how well the prediction values correlate with the actual 'is_blocker' labels.

DhanshreeA commented 7 months ago

Hi @teddyCodex I only see updates for task 1 from week 2. So far this looks good but I would advise against starting week 3 tasks. Only finish week 2 tasks if you can, I will do a final revision on Monday.

teddyCodex commented 7 months ago

Model Reproducibility:

Using the dataset from this publication, I attempt to reproduce the results achieved in Table S6 which includes detailed predictions for 49 approved antineoplastic drugs (including immunomodulating agents) by deephERG. The results are unexpectedly varied

Unknown Unknown2

The model does poorly in reproducing the results of the original publication. Thoughts / Recommendations:

teddyCodex commented 7 months ago

I had concerns with the results of the model run with python and wondered if the results would be the same using the CLI Model produced pretty much the same results as the Python Library.

It appears that the disconnect may come from the training and/or validation data

Unknown3

DhanshreeA commented 7 months ago

Thank you @teddyCodex , this is helpful. We have been observing issues with eos30gr. Your work will help us improve this model. Please go ahead and create your final application on Outreachy. :)