Here is my submission to the OSM Predictive Challenge Round 2.
Methodology
340 ligand training dataset was constructed from the provided 440 ligand OSM dataset with deduplication consisting of averaging the potency values for each replicated ligand sharing the same OSM code. These SMILES strings were then curated with ChemAxon Standardizer using the remove salts and solvents, add explicit hydrogens, and neutralise filters before initial 3D structures were generated using the Universal Force Field in RDKit. These structures were further optimised using the semiempirical PM7 methodology in the gas phase followed by geometry optimisation at the Hartree Fock with 3 corrections level of theory with the CPCM implicit solvation model configured for water in Orca 4.2.0. 21 Electronic descriptors were calculated from these structures. An automated machine learning modelling methodology was implemented using genetic algorithms to optimise 1000 QSAR models over 50 generations for the lowest Mean Absolute Error (MAE).
These are the predictions from my best model which got an average 0.55 MAE with 10 fold cross validation.
Here is my submission to the OSM Predictive Challenge Round 2.
Methodology 340 ligand training dataset was constructed from the provided 440 ligand OSM dataset with deduplication consisting of averaging the potency values for each replicated ligand sharing the same OSM code. These SMILES strings were then curated with ChemAxon Standardizer using the remove salts and solvents, add explicit hydrogens, and neutralise filters before initial 3D structures were generated using the Universal Force Field in RDKit. These structures were further optimised using the semiempirical PM7 methodology in the gas phase followed by geometry optimisation at the Hartree Fock with 3 corrections level of theory with the CPCM implicit solvation model configured for water in Orca 4.2.0. 21 Electronic descriptors were calculated from these structures. An automated machine learning modelling methodology was implemented using genetic algorithms to optimise 1000 QSAR models over 50 generations for the lowest Mean Absolute Error (MAE).
These are the predictions from my best model which got an average 0.55 MAE with 10 fold cross validation.