HamidHadipour / Deep-clustering-of-small-molecules-at-large-scale-via-variational-autoencoder-embedding-and-K-means

15 stars 4 forks source link

Feature Generation: Mapping of S-Hybridization to SP3D2-Hybridization #1

Open louiskhub opened 2 years ago

louiskhub commented 2 years ago

For one-hot-encoding, the list of hybridization choices misses S-Hybridization: https://github.com/HamidHadipour/Deep-clustering-of-small-molecules-at-large-scale-via-variational-autoencoder-embedding-and-K-means/blob/7d1446d8fe1e1ead1cad62fd057343faf6326212/feature_generation.py#L22-L28

Instead, if an Atom has S-Hybridization (int-value = 1) it is mapped to SP3D2-Hybridization (int-value = 6) as the index is set to -1: https://github.com/HamidHadipour/Deep-clustering-of-small-molecules-at-large-scale-via-variational-autoencoder-embedding-and-K-means/blob/7d1446d8fe1e1ead1cad62fd057343faf6326212/feature_generation.py#L12

In my option this does not make sense. Why aren't you just adding S-Hybridization to the list for one-hot-encoding instead?

louiskhub commented 2 years ago

Same issue for the missing Chem.rdchem.HybridizationType.UNSPECIFIED and atom.GetFormalCharge() == 3.

I get that these feature values are somewhat negligible because the occur so rarely. But is this a reason to ignore them?