deepchem / moleculenet

Moleculenet.ai Datasets And Splits
MIT License
88 stars 19 forks source link

band gap dataset loading issue #29

Open yuanqidu opened 3 years ago

yuanqidu commented 3 years ago

It looks like the splitter passed a float into Chem.MolFromSmiles.

The error message is as follows: Exception message: Python argument types in rdkit.Chem.rdmolfiles.CanonicalRankAtoms(NoneType) did not match C++ signature: CanonicalRankAtoms(RDKit::ROMol mol, bool breakTies=True, bool includeChirality=True, bool includeIsotopes=True)

/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/numpy/core/_asarray.py:83: VisibleDeprecationWarning: Creating an ndarray from ragged nested sequences (which is a list-or-tuple of lists-or-tuples-or ndarrays with different lengths or shapes) is deprecated. If you meant to do this, you must specify 'dtype=object' when creating the ndarray return array(a, dtype, copy=False, order=order)

job exception: No registered converter was able to produce a C++ rvalue of type std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > from this Python object of type float

0%| | 0/16 [00:05<?, ?trial/s, best loss=?] Traceback (most recent call last): File "gnn.py", line 235, in val_metrics, test_metrics = bayesian_optimization(args) File "gnn.py", line 160, in bayesian_optimization max_evals=args['num_trials']) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 553, in fmin rval.exhaust() File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 356, in exhaust self.run(self.max_evals - n_done, block_until_done=self.asynchronous) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 292, in run self.serial_evaluate() File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/fmin.py", line 170, in serial_evaluate result = self.domain.evaluate(spec, ctrl) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/hyperopt/base.py", line 907, in evaluate rval = self.fn(pyll_rval) File "gnn.py", line 143, in objective val_metrics, test_metrics = main(save_path, configure, hyperparams) File "gnn.py", line 51, in main args, tasks, all_dataset, transformers = load_dataset(args) File "/home/v-yuanqidu/moleculenet/examples/utils.py", line 101, in load_dataset featurizer=featurizer, splitter=splitter, reload=False) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/molnet/load_function/material_datasets/load_bandgap.py", line 106, in load_bandgap return loader.load_dataset('bandgap', reload) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/molnet/load_function/molnet_loader.py", line 186, in load_dataset train, valid, test = self.splitter.train_valid_test_split(dataset) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/splits/splitters.py", line 165, in train_valid_test_split log_every_n=log_every_n) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/splits/splitters.py", line 1362, in split scaffold_sets = self.generate_scaffolds(dataset) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/splits/splitters.py", line 1405, in generate_scaffolds scaffold = _generate_scaffold(smiles) File "/home/v-yuanqidu/.conda/envs/deepchem/lib/python3.7/site-packages/deepchem/splits/splitters.py", line 1178, in _generate_scaffold mol = Chem.MolFromSmiles(smiles) TypeError: No registered converter was able to produce a C++ rvalue of type std::__cxx11::basic_string<wchar_t, std::char_traits, std::allocator > from this Python object of type float

rbharath commented 3 years ago

Hmm, this is a little puzzling.

@ncfrey nd-02110114 Would either of you have any ideas on what could be causing the failure in the bandgap dataset?