gadsbyfly / PyBioMed

machine learning, molecular descriptor
http://pybiomed.readthedocs.io/en/latest/index.html
BSD 3-Clause "New" or "Revised" License
109 stars 61 forks source link

When I calculated the result of PubChem fingerprint, it was different from that of Padel software. Did anyone notice this problem, or did I make a mistake? Attached is my program #21

Open Ls94wood opened 2 years ago

Ls94wood commented 2 years ago

mol = Pymolecule.PyMolecule() mol.ReadMolFromSmile('CCOC1=CC=CC=C1OCCNC@HCC1=CC(=C(OC)C=C1)S(N)(=O)=O') mol.GetFingerprint(FPName='Pubchem')

the top ten fingerprint calculated by PyBioMed: 0, 0, 0, 0, 0, 0, 0, 0, 0, 1 the top ten fingerpring calculated by padel : 1 1 1 0 0 0 0 0 0 1

ZlatomirTodorov commented 1 year ago

smi_test = "CCOC1=CC=CC=C1OCCN[C@H]CC1=CC(=C(OC)C=C1)S(N)(=O)=O" mol = rdkit.Chem.MolFromSmiles(smi_test) molH = rdkit.Chem.AddHs(mol) pcfp_result = PCFP.calcPubChemFingerAll(molH) # this is a call of the source code for PubChem fingerprint from PyBioMed as a stand alone executable print(pcfp_result[:10])

top 10 bits: [1, 1, 1, 0, 0, 0, 0, 0, 0, 1]

Thus it turn that your problem is that PyBioMed computed the fingerprint without explicit hydrogens in your mol object.

However, I would point out some other real problems with the PubChem implementation of GetFingerprint(FPName='Pubchem') in this package that leads to massive faulty fingerprints. ==> I'm opening a new issue now...