ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
203 stars 131 forks source link

🦠 Model Request: Fine Tuned ImageMol Model for BACE Dataset from MoleculeNet #533

Closed DhanshreeA closed 1 year ago

DhanshreeA commented 1 year ago

Model Name

imagemol-bace

Model Description

Representation Learning Framework that utilizes molecule images for encoding molecular inputs as machine readable vectors for downstream tasks such as bio-activity prediction, drug metabolism analysis, or drug toxicity prediction. The approach utilizes transfer learning, that is, pre-training the model on massive unlabeled datasets to help it in generalizing feature extraction and then fine tuning on specific tasks.

BACE (BetA-seCretasE) dataset contains compounds that can be inhibitors of human 𝛽-secretase 1 (BACE-1).

Slug

bace-inhibitor

Tags

classification

Publication

Original Paper: https://www.nature.com/articles/s42256-022-00557-6

Supplementary Materials: https://static-content.springer.com/esm/art%3A10.1038%2Fs42256-022-00557-6/MediaObjects/42256_2022_557_MOESM1_ESM.pdf

Code

https://github.com/HongxinXiang/ImageMol

Checkpoints: https://drive.google.com/file/d/1q9-QCGbaACzw-QO2pOrK-FrGMr1yz1L0/view?usp=sharing

Parent Issue: https://github.com/ersilia-os/ersilia/issues/518

License

No response

GemmaTuron commented 1 year ago

/approve

github-actions[bot] commented 1 year ago

New Model Repository Created! 🎉

@DhanshreeA ersilia model respository has been successfully created and is available at:

🔗 ersilia-os/eos8c0o

Next Steps ⭐

Now that your new model respository has been created, you are ready to start contributing to it!

Here are some brief starter steps for contributing to your new model repository:

Note: Many of the bullet points below will have extra links if this is your first time contributing to a GitHub repository

Additional Resources 📚

If you have any questions, please feel free to open an issue and get support from the community!

GemmaTuron commented 1 year ago

Hi @DhanshreeA Try it again now, I've incorporated the model in our GitHub Team, this should give you git-lfs rights

GemmaTuron commented 1 year ago

Nope, I still get the following error when cloning:

Cloning into 'eos8c0o'... remote: Enumerating objects: 36, done. remote: Counting objects: 100% (36/36), done. remote: Compressing objects: 100% (23/23), done. remote: Total 36 (delta 4), reused 29 (delta 4), pack-reused 0 Unpacking objects: 100% (36/36), 9.64 KiB | 97.00 KiB/s, done. Downloading data.h5 (800 B) Error downloading object: data.h5 (26c4d44): Smudge error: Error downloading data.h5 (26c4d449632ea072317c16e9d4857e419b67e1b7751f81a89a87c8e75fe9484e): [26c4d449632ea072317c16e9d4857e419b67e1b7751f81a89a87c8e75fe9484e] Object does not exist on the server: [404] Object does not exist on the server

Errors logged to '/home/gturon/github/ersilia-os/eos8c0o/.git/lfs/logs/20230111T191927.358612518.log'. Use git lfs logs last to view the log. error: external filter 'git-lfs filter-process' failed fatal: data.h5: smudge filter lfs failed warning: Clone succeeded, but checkout failed. You can inspect what was checked out with 'git status' and retry with 'git restore --source=HEAD :/'

Can you try to push a git-lfs object see what happens?

DhanshreeA commented 1 year ago

@GemmaTuron this model is ready to be tested by others in the team.

GemmaTuron commented 1 year ago

@DhanshreeA I am curating the model metadata. Can you provide some information on this classification? The data comes from molecule net, I'd like to know which is the pIC50 cutoff used in the classifier (https://deepchem.readthedocs.io/en/latest/api_reference/moleculenet.html) -- I am trying to download it but this link does not seem to work, do you have the correct one? https://deepchemdata.s3-us-west-1.amazonaws.com/datasets/BACE.csv It does work for the HIV data

GemmaTuron commented 1 year ago

mm I think I found it in the original publication of the dataset: Subramanian, Govindan, et al. “Computational modeling of β-secretase 1 (BACE-1) inhibitors using ligand based approaches.” Journal of chemical information and modeling 56.10 (2016): 1936-1949.

Anything above pIC50 7 is Active below is inactive

GemmaTuron commented 1 year ago

@DhanshreeA

I've updated the workflow tests for new models and when doing so in the BACE model it pops the following error: Detailed error: Model API eos8c0o:predict did not produce an outputTraceback (most recent call last): File "/home/runner/eos/repository/eos8c0o/20230124080609_966FE5/eos8c0o/artifacts/framework/code/main.py", line 58, in <module> outputs = my_model(smiles_list) File "/home/runner/eos/repository/eos8c0o/20230124080609_966FE5/eos8c0o/artifacts/framework/code/main.py", line 36, in my_model model = load_model() Checking setup: 0.138s File "/home/runner/eos/repository/eos8c0o/20230124080609_966FE5/eos8c0o/artifacts/framework/code/main.py", line 30, in load_model checkpoint = torch.load(ckpt_path) File "/usr/share/miniconda/envs/eos8c0o/lib/python3.7/site-packages/torch/serialization.py", line 795, in load return _legacy_load(opened_file, map_location, pickle_module, **pickle_load_args) File "/usr/share/miniconda/envs/eos8c0o/lib/python3.7/site-packages/torch/serialization.py", line 1002, in _legacy_load magic_number = pickle_module.load(f, **pickle_load_args) _pickle.UnpicklingError: invalid load key, 'v'.

You can see more in the Actions workflow of the eos8c0o repo

DhanshreeA commented 1 year ago

@GemmaTuron could you share the link to the workflow job, I want to see if there's any other info in there we can use? I tried fetching eos8c0o and it fetches successfully for me. Here are the logs attached if you'd like to see: eos8c0o.log

GemmaTuron commented 1 year ago

Sure, you only need to go to the Actions section of the repository and there you can see the logs of all the workflow runs. In this case: https://github.com/ersilia-os/eos8c0o/actions/runs/3994141075