deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

[WIP] ACNN Benchmark #34

Closed ncfrey closed 3 years ago

ncfrey commented 3 years ago

I am attempting to add a benchmark for the newly fixed AtomicConvModel on the PDBbind dataset. I think I've implemented everything needed, but when the benchmarking tries to load_dataset, all the featurizations fail. I've confirmed that the corresponding MolNet loader works correctly outside of the scripts and I've tried multiple featurizers.

@mufeili and @yuanqidu, if you get a chance to take a look at my changes, I would appreciate any guidance while I continue to work on this benchmark. And if you see any obvious errors that would be causing the featurization step to fail, please let me know!

mufeili commented 3 years ago

Did you simply run python acnn.py -hs? I encountered the following error message.

Traceback (most recent call last):
  File "acnn.py", line 237, in <module>
    val_metrics, test_metrics = bayesian_optimization(args)
  File "acnn.py", line 160, in bayesian_optimization
    max_evals=args['num_trials'])
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.6/site-packages/hyperopt/fmin.py", line 553, in fmin
    rval.exhaust()
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.6/site-packages/hyperopt/fmin.py", line 356, in exhaust
    self.run(self.max_evals - n_done, block_until_done=self.asynchronous)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.6/site-packages/hyperopt/fmin.py", line 292, in run
    self.serial_evaluate()
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.6/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate
    result = self.domain.evaluate(spec, ctrl)
  File "/home/ubuntu/anaconda3/envs/deepchem/lib/python3.6/site-packages/hyperopt/base.py", line 907, in evaluate
    rval = self.fn(pyll_rval)
  File "acnn.py", line 143, in objective
    val_metrics, test_metrics = main(save_path, configure, hyperparams)
  File "acnn.py", line 83, in main
    to_stop = stopper(model, val_metric)
  File "/home/ubuntu/acnn/moleculenet/examples/utils.py", line 153, in __call__
    torch.save(model.model.state_dict(), self.save_path + '/early_stop.pt')
AttributeError: 'Functional' object has no attribute 'state_dict'

I think this is because AtomicConvModel is not a PyTorch model while L153 in utils.py saves model checkpoint for a PyTorch based model wrapper.

ncfrey commented 3 years ago

Hi @mufeili, yes, I just encountered that error as well and pushed a change to add support for Keras models.

Curiously, when I run python acnn.py -hs locally, it runs with no problem (but I'm on a cpu so it's very slow), but when I git clone my branch to Google Colab and run there, I encounter the featurization error.

ncfrey commented 3 years ago

This issue was resolved with condacolab and some manual dependency installation. I'm continuing to debug the benchmarking with KerasModel and AtomicConvModel and will hopefully have results soon.

ncfrey commented 3 years ago

I successfully completed a hyperparameter search and benchmark in Colab. I think this is ready for review @rbharath

ncfrey commented 3 years ago

I added an R^2 option to the available metrics and repeated the benchmarking run. Interestingly, running the bayesian optimization with a different metric to optimize yielded a different set of optimal hyperparameters. The results were also quite good! R^2 = 0.54 for test set and 0.60 for validation set.

I think this is ready for a final review @rbharath.