Open jannisborn opened 2 years ago
This looks like a discrepancy between generation part and the RGCN model. We will fix it.
If I recall correctly, the generative models don't perform well if you disable kekulize. It is not very easy to predict a ring correctly using aromatic bonds. Could you confirm that? @shichence
Thanks @KiddoZhu! Hm, I see that the generation might be easier if the molecules are kekulized, but still I feel that this should be a user decision.
Especially if the dataset constructor allows to set this option. The bare minimum would be to raise an Error that property optimization does not work without kekulization. I had to dig a while to find the cause of this error.
At the same time, it seems necessary that in the dataset constructor,node_features
is set to symbol
and not to default
. I'm not sure why this is but I got some shape mismatches in case I changed it to default
.
All autoregressive generative models take symbol
as node features. This is because other features may not be well defined for partial molecules during the generation. For kekulization, if I recall correctly, the original implementation of both GCPN and GraphAF use kekulization, and we follow that as default.
We will try to modify the interface so that users don't need to debug such details.
In the property optimization setting, it can easily happen that an
AssertionError
is raised in https://github.com/DeepGraphLearning/torchdrug/blob/d187dd85ed38042bc7e76e7a8c6f26d0f931cd3b/torchdrug/layers/conv.py#L422I investigated and found that
graph.num_relation
was3
whereasself.num_relation
was 4. The reason forgraph.num_relation
to be lowered was caused by this line: https://github.com/DeepGraphLearning/torchdrug/blob/d187dd85ed38042bc7e76e7a8c6f26d0f931cd3b/torchdrug/tasks/generation.py#L1345where
kekulize
is hard-coded to True. Consequently, the aromatic bonds are removed from the bond count. I would not like to kekulize my molecules and I launched the training with that specification, however, the package does not allow to control this hardcoded value.Here's the full error trace