I am trying to download information about all compounds that I consider to be active against Plasmodium species (i.e. inhibition IC50 of less than 1μM). I seem to be mostly achieving this using the following code:
from chembl_webresource_client.new_client import new_client
target = new_client.target
activity = new_client.activity
pf = target.filter(pref_name__iexact='Plasmodium falciparum').only('target_chembl_id')[0]
pf_activities = activity.filter(target_chembl_id=pf['target_chembl_id'],
pchembl_value__isnull=False,
pchembl_value__lte=6 # pIC50 value for IC50 < 1μM is 6
).filter(
standard_type="IC50").only(
['target_chembl_id', 'target_pref_name', 'standard_inchi_key', 'molecule_chembl_id',
'molecule_pref_name', 'pchembl_value'])
# hembl_value__lte=6. This condition ensures that only compounds with a pIC50 value (the negative logarithm of IC50) less than or equal to 6 (corresponding to IC50 < 1μM) are retrieved.
# The 6 value is derived from the conversion formula pIC50 = -log10(IC50)
compounds = pf_activities.only(['molecule_chembl_id'])
compound_data = []
# Download the compounds and collect data
from tqdm import tqdm
for i in tqdm(range(len(compounds)), desc="Getting compounds", ascii=False, ncols=72):
compound = compounds[i]
molecule_id = compound['molecule_chembl_id']
# Get compound details
compound_details = new_client.molecule.get(molecule_id)
inchikey = None
smiles = None
if compound_details['molecule_structures'] is not None:
inchikey = compound_details['molecule_structures']['standard_inchi_key']
smiles = compound_details['molecule_structures']['canonical_smiles']
name = compound_details['pref_name']
molecule_chembl_id = compound_details['molecule_chembl_id']
compound_data.append(
{'Compound Name': name, 'InChIKey': inchikey, 'Smiles': smiles,
'molecule_chembl_id': molecule_chembl_id})
# Create a DataFrame from the compound data
df = pd.DataFrame(compound_data)
However, I notice that this includes instances like seen here where CHEMBL111076 is included as an IC50 of 800.0 nM is given with a Plasmodium target but the actual assay was testing the Concentration required to reduce chloroquine IC50 by 50%. I wonder if there is a way to filter out these kinds of results?
I am trying to download information about all compounds that I consider to be active against Plasmodium species (i.e. inhibition IC50 of less than 1μM). I seem to be mostly achieving this using the following code:
However, I notice that this includes instances like seen here where CHEMBL111076 is included as an IC50 of 800.0 nM is given with a Plasmodium target but the actual assay was testing the Concentration required to reduce chloroquine IC50 by 50%. I wonder if there is a way to filter out these kinds of results?