Hi, I have two question again! :)

Thank you for always responding kindly!

I have two question about your model.

tau / alpha / beta functions for solving class imbalance still work for classification task too? As my best knowledge, PerceiverCPI based on chemprop. Chemprop could be used as both classification and regression model with some hyperparameters.

But in this part. (chemprop/train/train.py)

Using the current code as it is, it is unlikely that it will work as intended for classification problems. Because during classification, target_batch will be discrete. not continuous. I know perceiverCPI was designed as regression model. So, no offense! I just wanna know whether you are agree or not.

How did you conduct D-MPNN atom/bond feature ablation test? I want to know little bit more about details. There was no additional commentary when I looked for it.

Sincerely, P.S. I'm learning a lot from your study. Thank you. 👍

Hello, Thank you for your kind words.

Yes, you are right, the primary goal of our parameters was for the MSE loss function, which is inapplicable for the classification. For the classification task, we basically adopted the BCE loss function.

To assess which features are more profitable, we validated compound information features using an ablation study. Here, the testing features are independently turned off by masking them to zero. Note that we used the same method proposed by the original D-MPNN, in which all features were initialized using the toolkit for cheminformatics RDKit. To do it, you might need to go to chemprop\features\featurization.py file. For the atom features:

def atom_features(atom: Chem.rdchem.Atom, functional_groups: List[int] = None) -> List[Union[bool, int, float]]:
"""
Builds a feature vector for an atom.

:param atom: An RDKit atom.
:param functional_groups: A k-hot vector indicating the functional groups the atom belongs to.
:return: A list containing the atom features.
"""
if atom is None:
    features = [0] * ATOM_FDIM
else:
    features = onek_encoding_unk(atom.GetAtomicNum() - 1, ATOM_FEATURES['atomic_num']) + \
        onek_encoding_unk(atom.GetTotalDegree(), ATOM_FEATURES['degree']) + \
        onek_encoding_unk(atom.GetFormalCharge(), ATOM_FEATURES['formal_charge']) + \
        onek_encoding_unk(int(atom.GetChiralTag()), ATOM_FEATURES['chiral_tag']) + \
        onek_encoding_unk(int(atom.GetTotalNumHs()), ATOM_FEATURES['num_Hs']) + \
        onek_encoding_unk(int(atom.GetHybridization()), ATOM_FEATURES['hybridization']) + \
        [1 if atom.GetIsAromatic() else 0] + \
        [atom.GetMass() * 0.01]  # scaled to about the same range as other features
    if functional_groups is not None:
        features += functional_groups
return features

For the bond_features:

def bond_features(bond: Chem.rdchem.Bond) -> List[Union[bool, int, float]]:
    """
    Builds a feature vector for a bond.

    :param bond: An RDKit bond.
    :return: A list containing the bond features.
    """
    if bond is None:
        fbond = [1] + [0] * (BOND_FDIM - 1)
    else:
        bt = bond.GetBondType()
        fbond = [
            0,  # bond is not None
            bt == Chem.rdchem.BondType.SINGLE,
            bt == Chem.rdchem.BondType.DOUBLE,
            bt == Chem.rdchem.BondType.TRIPLE,
            bt == Chem.rdchem.BondType.AROMATIC,
            (bond.GetIsConjugated() if bt is not None else 0),
            (bond.IsInRing() if bt is not None else 0)
        ]
        fbond += onek_encoding_unk(int(bond.GetStereo()), list(range(6)))
    return fbond

P/s; If there's a specific topic you'd like information on or if you need any further assistance, please feel free to let me know. Have a great time exploring, and don't hesitate to reach out!.

dmis-lab / PerceiverCPI

Hi, I have two question again! :) #9