OpenSourceMalaria / Series4_PredictiveModel

Can we Predict Active Compounds in OSM Series 4?
7 stars 10 forks source link

Davy OSM R2 Submission #14

Closed IamDavyG closed 5 years ago

IamDavyG commented 5 years ago

Here is my submission to the OSM Predictive Challenge Round 2.

Methodology 340 ligand training dataset was constructed from the provided 440 ligand OSM dataset with deduplication consisting of averaging the potency values for each replicated ligand sharing the same OSM code. These SMILES strings were then curated with ChemAxon Standardizer using the remove salts and solvents, add explicit hydrogens, and neutralise filters before initial 3D structures were generated using the Universal Force Field in RDKit. These structures were further optimised using the semiempirical PM7 methodology in the gas phase followed by geometry optimisation at the Hartree Fock with 3 corrections level of theory with the CPCM implicit solvation model configured for water in Orca 4.2.0. 21 Electronic descriptors were calculated from these structures. An automated machine learning modelling methodology was implemented using genetic algorithms to optimise 1000 QSAR models over 50 generations for the lowest Mean Absolute Error (MAE).

These are the predictions from my best model which got an average 0.55 MAE with 10 fold cross validation.