chembl / chembl_webservices_2

Source code of the ChEMBL web services.
https://www.ebi.ac.uk/chembl/ws
Other
16 stars 3 forks source link

Filtering based on types of bioassay #187

Open alrichardbollans opened 1 year ago

alrichardbollans commented 1 year ago

I am trying to download information about all compounds that I consider to be active against Plasmodium species (i.e. inhibition IC50 of less than 1μM). I seem to be mostly achieving this using the following code:

from chembl_webresource_client.new_client import new_client
    target = new_client.target
    activity = new_client.activity
    pf = target.filter(pref_name__iexact='Plasmodium falciparum').only('target_chembl_id')[0]
    pf_activities = activity.filter(target_chembl_id=pf['target_chembl_id'],
                                    pchembl_value__isnull=False,
                                    pchembl_value__lte=6  # pIC50 value for IC50 < 1μM is 6
                                    ).filter(
        standard_type="IC50").only(
        ['target_chembl_id', 'target_pref_name', 'standard_inchi_key', 'molecule_chembl_id',
         'molecule_pref_name', 'pchembl_value'])

    # hembl_value__lte=6. This condition ensures that only compounds with a pIC50 value (the negative logarithm of IC50) less than or equal to 6 (corresponding to IC50 < 1μM) are retrieved.
    # The 6 value is derived from the conversion formula pIC50 = -log10(IC50)
    compounds = pf_activities.only(['molecule_chembl_id'])

    compound_data = []
    # Download the compounds and collect data
    from tqdm import tqdm
    for i in tqdm(range(len(compounds)), desc="Getting compounds", ascii=False, ncols=72):

        compound = compounds[i]
        molecule_id = compound['molecule_chembl_id']

        # Get compound details
        compound_details = new_client.molecule.get(molecule_id)
        inchikey = None
        smiles = None
        if compound_details['molecule_structures'] is not None:
            inchikey = compound_details['molecule_structures']['standard_inchi_key']
            smiles = compound_details['molecule_structures']['canonical_smiles']
        name = compound_details['pref_name']
        molecule_chembl_id = compound_details['molecule_chembl_id']
        compound_data.append(
            {'Compound Name': name, 'InChIKey': inchikey, 'Smiles': smiles,
             'molecule_chembl_id': molecule_chembl_id})

    # Create a DataFrame from the compound data
    df = pd.DataFrame(compound_data)

However, I notice that this includes instances like seen here where CHEMBL111076 is included as an IC50 of 800.0 nM is given with a Plasmodium target but the actual assay was testing the Concentration required to reduce chloroquine IC50 by 50%. I wonder if there is a way to filter out these kinds of results?