Open louiskhub opened 2 years ago
Same issue for the missing Chem.rdchem.HybridizationType.UNSPECIFIED
and atom.GetFormalCharge() == 3
.
I get that these feature values are somewhat negligible because the occur so rarely. But is this a reason to ignore them?
For one-hot-encoding, the list of hybridization choices misses S-Hybridization: https://github.com/HamidHadipour/Deep-clustering-of-small-molecules-at-large-scale-via-variational-autoencoder-embedding-and-K-means/blob/7d1446d8fe1e1ead1cad62fd057343faf6326212/feature_generation.py#L22-L28
Instead, if an Atom has S-Hybridization (
int-value = 1
) it is mapped to SP3D2-Hybridization (int-value = 6
) as theindex
is set to-1
: https://github.com/HamidHadipour/Deep-clustering-of-small-molecules-at-large-scale-via-variational-autoencoder-embedding-and-K-means/blob/7d1446d8fe1e1ead1cad62fd057343faf6326212/feature_generation.py#L12In my option this does not make sense. Why aren't you just adding S-Hybridization to the list for one-hot-encoding instead?