Closed xptree closed 5 years ago
Thank you for your interst & question.
I guess DFT need to be executed to calculate precise distance matrix in QM9 dataset, which means that cost for using/calculating precise distance matrix is almost same with calculating labels(energy etc) in QM 9 dataset. Thus we use "rough" distance matrix calculated by rdkit.
And simply, we could not investigate this issue further yet. https://github.com/pfnet-research/chainer-chemistry/issues/287
@corochann Got it, thanks for your reply.
I am curious that why chainer-chemistry uses rdkit to generate 3D distance matrix instead of using 3D geometry in QM9 dataset directly. I ask this question because the 3D geometry provided by QM9 dataset is believed to be much more accurate than that by rdkit. Do I miss something in the code?
Code details:
chainer-chemistry skips the 3D geometry when reading QM9 dataset: https://github.com/pfnet-research/chainer-chemistry/blob/c05b879241cf62ccd78982145243705b1f81e43b/chainer_chemistry/datasets/qm9.py#L119-L127
chainer-chemistry calls rdit to get the 3D distance matrix: https://github.com/pfnet-research/chainer-chemistry/blob/6f30a6589b591716c0414352d501c9275971b5b5/chainer_chemistry/dataset/preprocessors/schnet_preprocessor.py#L42