deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

KAGGLE dataset loader #27

Closed yuanqidu closed 3 years ago

yuanqidu commented 3 years ago

It seems the dataset loader has some issues. I tried to load the data and run the GCN model in moleculenet (dc.models.GCNModel). The following is the error message:

Traceback (most recent call last): File "gnn.py", line 235, in val_metrics, test_metrics = bayesian_optimization(args) File "gnn.py", line 160, in bayesian_optimization max_evals=args['num_trials']) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 553, in fmin rval.exhaust() File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 356, in exhaust self.run(self.max_evals - n_done, block_until_done=self.asynchronous) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 292, in run self.serial_evaluate() File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate result = self.domain.evaluate(spec, ctrl) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/base.py", line 907, in evaluate rval = self.fn(pyll_rval) File "gnn.py", line 143, in objective val_metrics, test_metrics = main(save_path, configure, hyperparams) File "gnn.py", line 78, in main restore=epoch > 0) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/models/torch_models/torch_model.py", line 286, in fit checkpoint_interval, restore, variables, loss, callbacks, all_losses) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/models/torch_models/torch_model.py", line 363, in fit_generator inputs, labels, weights = self._prepare_batch(batch) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/models/torch_models/gcn.py", line 349, in _prepare_batch graph.to_dgl_graph(self_loop=self._self_loop) for graph in inputs[0] File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/models/torch_models/gcn.py", line 349, in graph.to_dgl_graph(self_loop=self._self_loop) for graph in inputs[0] AttributeError: 'numpy.ndarray' object has no attribute 'to_dgl_graph'

rbharath commented 3 years ago

Ah sorry this isn't well explained in the docs. The Kaggle, UV, Kinase datasets are from a paper with Merck I wrote several years ago where they only provided disguised descriptors. So we don't actually have the structures and can't apply models that need molecular structure

yuanqidu commented 3 years ago

Thanks for your comments. Got it!