chembl / chembl_webresource_client

Official Python client for accessing ChEMBL API
https://www.ebi.ac.uk/chembl/api/data/docs
Other
360 stars 95 forks source link

Getting canonical smiles directly #111

Closed ConstantinWaquet closed 2 years ago

ConstantinWaquet commented 2 years ago

Is there a way of getting the canonical smiles directly from the web service, rather than getting all of the molecule structures data and then extracting the smiles? I waste a lot of time downloading all the redundant information when I only want a small portion of it.

The way I'm currently doing this is like this:

from chembl_webresource_client.new_client import new_client

#get the data (at the moment all of the 'molecule_structures' data)

mols = new_client.molecule.filter(
    max_phase = 4, 
    first_approval__gte = 2000,
    molecule_properties__mw_freebase__lte = 500
    ).only('molecule_structures')

#then I have to use list comprehension to extract the 'canonical_smiles'. 
#For the amount of data in this example, it takes about 1:30min to extract the 607 SMILES. 
#I'm trying to get a lot more than 607 SMILES, but this ends up taking really long

mol_smiles = [mol['molecule_structures']['canonical_smiles'] for mol in mols if mol['molecule_structures']]
eloyfelix commented 2 years ago

Hi Constantin, sorry for the delayed reply. Retrieving SMILES only is unfortunately not possible and it would be difficult to change on the library due to how it was designed.