When I calculated the result of PubChem fingerprint, it was different from that of Padel software. Did anyone notice this problem, or did I make a mistake? Attached is my program

gadsbyfly / PyBioMed

machine learning, molecular descriptor

BSD 3-Clause "New" or "Revised" License

109 stars 61 forks source link

smi_test = "CCOC1=CC=CC=C1OCCN[C@H]CC1=CC(=C(OC)C=C1)S(N)(=O)=O" mol = rdkit.Chem.MolFromSmiles(smi_test) molH = rdkit.Chem.AddHs(mol) pcfp_result = PCFP.calcPubChemFingerAll(molH) # this is a call of the source code for PubChem fingerprint from PyBioMed as a stand alone executable print(pcfp_result[:10])

top 10 bits: [1, 1, 1, 0, 0, 0, 0, 0, 0, 1]

Thus it turn that your problem is that PyBioMed computed the fingerprint without explicit hydrogens in your mol object.

However, I would point out some other real problems with the PubChem implementation of GetFingerprint(FPName='Pubchem') in this package that leads to massive faulty fingerprints. ==> I'm opening a new issue now...

gadsbyfly / PyBioMed

When I calculated the result of PubChem fingerprint, it was different from that of Padel software. Did anyone notice this problem, or did I make a mistake? Attached is my program #21