chembl / chembl_webresource_client

Official Python client for accessing ChEMBL API
https://www.ebi.ac.uk/chembl/api/data/docs
Other
360 stars 95 forks source link

Assay offsets and limit #123

Closed YojanaGadiya closed 1 year ago

YojanaGadiya commented 1 year ago

Dear Chembl community,

I am getting an error on the limit of bioactivity data I can extract using the client. With the code line:

ASSAY_METADATA_COLS = [
            'pchembl_value',
            'molecule_chembl_id',
            'activity_id',
            'target_pref_name',
            'molecule_pref_name',
            'standard_type',
            'standard_units',
            'standard_value',
            'standard_relation'
        ]

prot_activity_data = activity.filter(
                                      target_chembl_id='CHEMBL1075125',
                                      assay_type_iregex='(B|F)',
                                  ).only([ASSAY_METADATA_COLS])

I get the following error:

Traceback (most recent call last):
  File "/lib/python3.9/site-packages/chembl_webresource_client/query_set.py", line 78, in __repr__
    data = list(clone[:Settings.Instance().REPR_OUTPUT_SIZE])
  File "/lib/python3.9/site-packages/chembl_webresource_client/query_set.py", line 127, in __next__
    return self.next()
  File "lib/python3.9/site-packages/chembl_webresource_client/query_set.py", line 113, in next
    self.chunk = self.query.get_page()
  File "/lib/python3.9/site-packages/chembl_webresource_client/url_query.py", line 394, in get_page
    handle_http_error(res)
  File "/lib/python3.9/site-packages/chembl_webresource_client/http_errors.py", line 113, in handle_http_error
    raise exception_class(request.url, request.text)
chembl_webresource_client.http_errors.HttpApplicationError: Error for url https://www.ebi.ac.uk/chembl/api/data/activity.json, server response: {"error_message": "unhashable type: 'list'", "traceback": "Traceback (most recent call last):\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 272, in wrapper\n    response = callback(request, *args, **kwargs)\n\n  File \"/opt/conda/envs/chembl-webservices-py3/lib/python3.9/site-packages/tastypie/resources.py\", line 467, in dispatch_list\n    return self.dispatch('list', request, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 901, in dispatch\n    response = method(request, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 836, in get_list\n    return self.response(self.get_list_impl)(request, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 741, in get_something\n    ret = f(request, basic_bundle, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 830, in get_list_impl\n    return self.serialise_list(self.cached_obj_get_list, for_list=True, for_search=False)(\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 810, in handler\n    to_be_serialized, in_cache = f(bundle=base_bundle,\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 684, in cached_obj_get_list\n    return self.list_cache_handler(self.list_source)(bundle, 'list', 'api_dispatch_list', **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 438, in handle\n    sorted_objects = data_provider(bundle, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 678, in list_source\n    sorted_objects = self.prefetch_related(sorted_objects, **kwargs)\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 1037, in prefetch_related\n    if only and all([not self.fields[field].is_m2m for field in only if field in self.fields]):\n\n  File \"/chembl_ws_py3/src/chembl_webservices/core/resource.py\", line 1037, in <listcomp>\n    if only and all([not self.fields[field].is_m2m for field in only if field in self.fields]):\n\nTypeError: unhashable type: 'list'\n"}

I suspect this is because of inconsistency between the number of bioactivities registered for this target (48) and the internal limit of the client (20). Is there any efficient way to retrieve such queries?

Thank You.

juanfmx2 commented 1 year ago

Hi, I have tested the url and it works: https://www.ebi.ac.uk/chembl/api/data/activity.json?target_chembl_id=CHEMBL1075125&assay_type_iregex=(B|F)&limit=1000&only=pchembl_value,molecule_chembl_id,activity_id,target_pref_name

I think the issue is related with the extra [] when you are passing the only function you are submitting a list inside another list.

You are doing the following call:

prot_activity_data = activity.filter(
                                      target_chembl_id='CHEMBL1075125',
                                      assay_type_iregex='(B|F)',
                                  ).only(
                                              [
                                                [
                                                     'pchembl_value',
                                                     'molecule_chembl_id',
                                                      'activity_id',
                                                      'target_pref_name',
                                                      'molecule_pref_name',
                                                      'standard_type',
                                                      'standard_units',
                                                      'standard_value',
                                                      'standard_relation'
                                                ]
                                              ])

If after removing the extra [] passing the list itself does not work I would try passing ','.join(ASSAY_METADATA_COLS)

Hope this helps, Juan

YojanaGadiya commented 1 year ago

@juanfmx2 Ah, I see. I noticed it now. Thanks. That solves the issue :)