ersilia-os / ersilia

The Ersilia Model Hub, a repository of AI/ML models for infectious and neglected disease research.
https://ersilia.io
GNU General Public License v3.0
189 stars 123 forks source link

🦠 Model Request: QupKake: predict micro-pKa of organic molecules #1186

Open LauraGomezjurado opened 4 days ago

LauraGomezjurado commented 4 days ago

Model Name

Predict micro-pKa of organic molecules

Model Description

QupKake is an innovative approach that combines graph neural network (GNN) models with semiempirical quantum mechanical (QM) features to forecast the micro-pKa values of organic molecules. QM has a significant role in both identifying reaction sites and predicting micro-pKa values. Precisely predicting micro-pKa values is vital for comprehending and adjusting the acidity and basicity of organic compounds, This has significant applications in drug discovery, materials science, and environmental chemistry.

Slug

qupkake-micro-pKa

Tag

pKa

Publication

https://doi.org/10.1021/acs.jctc.4c00328

Source Code

https://github.com/hutchisonlab/QupKake

License

CC-BY-4.0

LauraGomezjurado commented 4 days ago

I have encountered multiple issues populating the data when running the model when populating the data. Specifically, as I debug it shows that the d the DataLoader length is 0, meaning the DataLoader is not receiving any data from the dataset. This is why trainer.predict is returning None, as there are no batches to process, and therefore the terminal responds with a "NoneIterable" error". I have tried using the command lines qupkake smiles "Cc1cc(-n2ncc(=O)[nH]c2=O)ccc1C(=O)c1ccccc1Cl" as well as qupkake file path/to/novartis_qupkake_pka.sdf using the data from the github repository . I am not sure how to fix this issue of what is causing it. I am currently trying to go deeper into the mol_dataset.py file as it has the MolDataset initialization to load the data

SITE_PREDICTIONS: None Error: sites_predictions is None DataLoader length: 0 SITE_PREDICTIONS: None Error: sites_predictions is None No protonation/deprotonation sites were found. Output file will not be created.

GemmaTuron commented 4 days ago

Hi @LauraGomezjurado

What exactly are you doing? I just followed the instructions and got the result as an .sdf file in the /data/output folder:

git clone https://github.com/Shualdon/QupKake.git
cd qupkake
conda env create -f environment.yml
conda activate qupkake
pip install .
qupkake smiles "Cc1[nH]c2ccccc2c1CCNCc1ccc(CCC(=O)N=O)cc1"

Is this precisely what you are doing? And which platform are you running the code in? I'd recommend a Linux or a WSL

LauraGomezjurado commented 2 days ago

Thank you so much @GemmaTuron. I did as you suggested of using codespace and it worked perfectly fine! The model is running and I am able to receive the output file. Now I will look into how to convert the output file into a readable .csv