chembl / chembl_webresource_client

Official Python client for accessing ChEMBL API
https://www.ebi.ac.uk/chembl/api/data/docs
Other
375 stars 95 forks source link

Failed to generate MACCS fingerprints using utils.sdf2fps() #57

Closed paulahsan closed 2 years ago

paulahsan commented 5 years ago

Hi, Thanks for this nice tool. I tried the following things to generate MACCS-166 fingerprints for some of the compounds of my interest.

from chembl_webresource_client.new_client import new_client
from chembl_webresource_client.utils import utils

molecule = new_client.molecule
molecule.set_format('sdf')

cpd_list = ["CHEMBL268177","CHEMBL268439"]

morgan = []
for i in cpd_list:
    cmpnd = molecule.get(i)
    fps = utils.sdf2fps(cmpnd) #by default it is 'morgan'
    morgan.append(fps)

print(morgan[0])

"""
output
#FPS1
#num_bits=2048
#software=RDKit/2017.03.3
00000000000000000000000000000000000000000000000000000000000000000000000000000080000000400000000000000000000000000000000000000000000000000000000040000000000000000004200000000000000800000000000000001000000000000200000000080000004000002000080000000000000000000000000000000000000000000000000000000800002000000000000000000004000000000000000000000000100000000000000000000200100000000000000000000000100000008000000000000000000000000000000000004000000000000000000000000000000002000000000000000000008000000000000000000000    CHEMBL268177
"""
# but if i change it
maccs = []
for i in cpd_list:
    cmpnd = molecule.get(i)
    fps = utils.sdf2fps(cmpnd, 'maccs') #How to define that I want MACCS-166?
    maccs.append(fps)

print(maccs[0])

"""
output for this
#FPS1
#num_bits=2048
#software=RDKit/2017.03.3
"""

Even curl failed to show any desired result of mine

!curl -X POST -F "file=cmpnd" -F "type=maccs" https://www.ebi.ac.uk/chembl/api/utils/sdf2fps
#FPS1
#num_bits=2048
#software=RDKit/2017.03.3

Is it a bug or I misunderstood something?

eloyfelix commented 2 years ago

The sdf2fps endpoint was deprecated since we had no requests in our logs for a long time. Sorry for any inconvenience caused.

fps can be generated using RDKit:

import rdkit
from rdkit import Chem
from rdkit.Chem import rdMolDescriptors
from rdkit.DataStructs import BitVectToFPSText

mol = Chem.MolFromSmiles("O=C(C)Oc1ccccc1C(=O)O")

fpstext = BitVectToFPSText(rdMolDescriptors.GetMACCSKeysFingerprint(mol))

fps = f"""#FPS1
#num_bits={fp_size}
#type=MACCSKeys
#software=RDKit/{rdkit.__version__}
{fpstext}
"""

print(fps)
#FPS1
#num_bits=1024
#type=MACCSKeys
#software=RDKit/2021.03.1
000000000000000000000002000002c8009945a53d