Open Phuangji opened 1 month ago
I think the "[nH]" in the SMILES string is the issue. In the training dataset I used, all SMILES strings were canonical SMILES, and those with ions were all filtered out. Please make sure the SMILES string of the substrate was in canonical form.
Thank you for your reply. There are still some questions. First, I did convert my SMILES to canonical forms , and the code is as follows:
class MolClean(object):
def __init__(self):
self.normizer = MolStandardize.normalize.Normalizer()
self.lfc = MolStandardize.fragment.LargestFragmentChooser()
self.uc = MolStandardize.charge.Uncharger()
def clean(self, smi):
mol = Chem.MolFromSmiles(smi)
if mol:
mol = self.normizer.normalize(mol)
mol = self.lfc.choose(mol)
mol = self.uc.uncharge(mol)
smi = Chem.MolToSmiles(mol, isomericSmiles=False, canonical=True)
return smi
else:
return None
Second, I can normally handle some SMILES that contain [nH], such as: O=c1[nH]c(=O)c2[nH]cnc2[nH]1 Also, the error SMILES does not have ions , it only contains a nitrogen heterocycle. So maybe your reply does not solve my problem. Thank you again!
Ok, in this case, the most possible reason is that the substrate contains some molecular fingerprints that are not in my training data, and thus the substrate cannot be encoded as features.
Hello! When I predict kcat of some E.coli reactions, it says my SMILES are out of the range of atoms like this. For example, CC1(CC(=O)O)C2=Cc3[nH]c(c(CCC(=O)O)c3CC(=O)O)Cc3[nH]c(c(CC(=O)O)c3CCC(=O)O)C=C3N=C(C=C(N2)C1CCC(=O)O)C(C)(CC(=O)O)C3CCC(=O)O
I use CPU here. If use GPU there will be similar errors. Could you please tell me what's the problem with my operation? Looking forward to your reply. Thank you!