Open MyGithubNotYours opened 3 years ago
Whoa, this looks awesome, thanks for sharing @MyGithubNotYours! :smiley:
One quick suggestion from first skim: you can use the analysis.json file linked on the dashboard to avoid needing to parse HTML (I'm assuming that's what you did here?), and get access to all of the data coming from the analysis (including some raw simulation data).
@mcwitt ahh 'analysis.json' was sitting there under my nose this whole time! Oops haha thanks for pointing that out to me.
No, I didn't parse HTML. I used complete brute force:
Luckily, copying & pasting conserved the table formatting, so Pandas had no problem reading it as a CSV without any work from me.
EDIT 2020-12-06: The DeepChem issue went away when I upgraded to DeepChem 2.2 and TensorFlow 1.12.
Hi guys
@jchodera referred me here. I told him that I think it might be possible to predict energies before compounds are simulated. I suppose that this could be used to prioritize the order in which compounds are simulated. In other words, get good results sooner (hopefully). I'm here to show a proof-of-concept based on simple non-sophisticated effort. If I'm interpreting the results correctly, it appears that MSE is ~3 and classification is ~70% for bad compounds. With better models, I'm sure the results could be improved.
Here's a Jupyter notebook of my findings: https://github.com/MyGithubNotYours/FAH_stuff/blob/master/FAH-rank-molecules.ipynb
Here's a datafile containing some data from sprint 4: https://github.com/MyGithubNotYours/FAH_stuff/blob/master/FAH_results_s4.csv
If you enter the location of the data file at the beginning of the notebook, then the rest of the notebook should run without you needing to change anything. The only changes you might decide to do are: choose between regression mode vs classification mode, and choose which featurizer is being used. In the notebook, I've created 4 versions of the training data - each is generated by using a different featurizer.
Disclaimer:
Questions:
What do you think is the best way to featurize MPro or its binding pockets/sites? Maybe predictive ability would be increased if such features could be included as input data along with the features of the simulated compounds (and indicate which of the pockets each simulated compound is connecting with).
DeepChem's GraphConvModel uses a lot of GPU memory. I used a loop from this DeepChem tutorial, and it increases the GPU memory usage after every iteration. Eventually, after some number of iterations of this loop, my GPU crashes from memory resource exhaustion. Any ideas on what I'm doing incorrectly? https://github.com/deepchem/deepchem/blob/master/examples/tutorials/04_Introduction_to_Graph_Convolutions.ipynb Here's the loop in question:EDIT 2020-12-06: The DeepChem issue went away when I upgraded to DeepChem 2.2 and TensorFlow 1.12.
Let me know what you think!