Closed DBpackage closed 9 months ago
Hello, Thank you for your kind words.
To assess which features are more profitable, we validated compound information features using an ablation study. Here, the testing features are independently turned off by masking them to zero. Note that we used the same method proposed by the original D-MPNN, in which all features were initialized using the toolkit for cheminformatics RDKit. To do it, you might need to go to chemprop\features\featurization.py file. For the atom features:
def atom_features(atom: Chem.rdchem.Atom, functional_groups: List[int] = None) -> List[Union[bool, int, float]]:
"""
Builds a feature vector for an atom.
:param atom: An RDKit atom.
:param functional_groups: A k-hot vector indicating the functional groups the atom belongs to.
:return: A list containing the atom features.
"""
if atom is None:
features = [0] * ATOM_FDIM
else:
features = onek_encoding_unk(atom.GetAtomicNum() - 1, ATOM_FEATURES['atomic_num']) + \
onek_encoding_unk(atom.GetTotalDegree(), ATOM_FEATURES['degree']) + \
onek_encoding_unk(atom.GetFormalCharge(), ATOM_FEATURES['formal_charge']) + \
onek_encoding_unk(int(atom.GetChiralTag()), ATOM_FEATURES['chiral_tag']) + \
onek_encoding_unk(int(atom.GetTotalNumHs()), ATOM_FEATURES['num_Hs']) + \
onek_encoding_unk(int(atom.GetHybridization()), ATOM_FEATURES['hybridization']) + \
[1 if atom.GetIsAromatic() else 0] + \
[atom.GetMass() * 0.01] # scaled to about the same range as other features
if functional_groups is not None:
features += functional_groups
return features
For the bond_features:
def bond_features(bond: Chem.rdchem.Bond) -> List[Union[bool, int, float]]:
"""
Builds a feature vector for a bond.
:param bond: An RDKit bond.
:return: A list containing the bond features.
"""
if bond is None:
fbond = [1] + [0] * (BOND_FDIM - 1)
else:
bt = bond.GetBondType()
fbond = [
0, # bond is not None
bt == Chem.rdchem.BondType.SINGLE,
bt == Chem.rdchem.BondType.DOUBLE,
bt == Chem.rdchem.BondType.TRIPLE,
bt == Chem.rdchem.BondType.AROMATIC,
(bond.GetIsConjugated() if bt is not None else 0),
(bond.IsInRing() if bt is not None else 0)
]
fbond += onek_encoding_unk(int(bond.GetStereo()), list(range(6)))
return fbond
P/s; If there's a specific topic you'd like information on or if you need any further assistance, please feel free to let me know. Have a great time exploring, and don't hesitate to reach out!.
Thank you for always responding kindly!
I have two question about your model.
But in this part. (chemprop/train/train.py)
Using the current code as it is, it is unlikely that it will work as intended for classification problems. Because during classification, target_batch will be discrete. not continuous. I know perceiverCPI was designed as regression model. So, no offense! I just wanna know whether you are agree or not.
Sincerely, P.S. I'm learning a lot from your study. Thank you. 👍