Is there a way of getting the canonical smiles directly from the web service, rather than getting all of the molecule structures data and then extracting the smiles? I waste a lot of time downloading all the redundant information when I only want a small portion of it.
The way I'm currently doing this is like this:
from chembl_webresource_client.new_client import new_client
#get the data (at the moment all of the 'molecule_structures' data)
mols = new_client.molecule.filter(
max_phase = 4,
first_approval__gte = 2000,
molecule_properties__mw_freebase__lte = 500
).only('molecule_structures')
#then I have to use list comprehension to extract the 'canonical_smiles'.
#For the amount of data in this example, it takes about 1:30min to extract the 607 SMILES.
#I'm trying to get a lot more than 607 SMILES, but this ends up taking really long
mol_smiles = [mol['molecule_structures']['canonical_smiles'] for mol in mols if mol['molecule_structures']]
Hi Constantin, sorry for the delayed reply. Retrieving SMILES only is unfortunately not possible and it would be difficult to change on the library due to how it was designed.
Is there a way of getting the canonical smiles directly from the web service, rather than getting all of the molecule structures data and then extracting the smiles? I waste a lot of time downloading all the redundant information when I only want a small portion of it.
The way I'm currently doing this is like this: