DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

Unable to filter for projects without cell counts #2850

Open amarjandu opened 3 years ago

amarjandu commented 3 years ago
(.venv) Specter:azul amar$ http 'https://service.dev.singlecell.gi.ucsc.edu/index/projects?size=15&filters={"cellCount":{"is":[null]}}'
HTTP/1.1 500 Internal Server Error
Access-Control-Allow-Headers: Authorization,Content-Type,X-Amz-Date,X-Amz-Security-Token,X-Api-Key
Access-Control-Allow-Origin: *
Connection: keep-alive
Content-Length: 1520
Content-Type: text/plain
Date: Wed, 03 Mar 2021 00:06:10 GMT
Via: 1.1 2160425a36a5a0604048dda5b151b504.cloudfront.net (CloudFront)
X-Amz-Cf-Id: OW-DFYR2v2wMrfLfsogRSXG96xQjU6ZBQy1LWzKl9-nE5pPm-AYDuA==
X-Amz-Cf-Pop: SFO53-C1
X-Amzn-Trace-Id: Root=1-603ed2f2-536db332611a632e5040946e;Sampled=0
X-Cache: Error from cloudfront
x-amz-apigw-id: blXl3HNpoAMFWHg=
x-amzn-RequestId: dd1b0950-4283-413b-bced-f7d235d92d65

Traceback (most recent call last):
  File "/var/task/chalice/app.py", line 1135, in _get_view_function_response
    response = view_function(**function_args)
  File "/var/task/app.py", line 1206, in get_project_data
    return repository_search('projects', project_id)
  File "/var/task/app.py", line 947, in repository_search
    return service.get_data(catalog=catalog,
  File "/var/task/azul/service/index_query_service.py", line 67, in get_data
    response = self.transform_request(catalog=catalog,
  File "/var/task/azul/service/elasticsearch_service.py", line 611, in transform_request
    es_response = es_search.execute(ignore_cache=True)
  File "/opt/python/elasticsearch_dsl/search.py", line 702, in execute
    es.search(
  File "/opt/python/elasticsearch/client/utils.py", line 84, in _wrapped
    return func(*args, params=params, **kwargs)
  File "/opt/python/elasticsearch/client/__init__.py", line 851, in search
    return self.transport.perform_request(
  File "/opt/python/elasticsearch/transport.py", line 351, in perform_request
    status, headers_response, data = connection.perform_request(
  File "/opt/python/elasticsearch/connection/http_requests.py", line 161, in perform_request
    self._raise_error(response.status_code, raw_data)
  File "/opt/python/elasticsearch/connection/base.py", line 229, in _raise_error
    raise HTTP_EXCEPTIONS.get(status_code, TransportError)(
elasticsearch.exceptions.RequestError: RequestError(400, 'parsing_exception', 'No value specified for terms query')

Expected to be able to filter for projects that did do not have cell counts. In order to find projects without cell counts, the filter was changed to [0].

http 'https://service.dev.singlecell.gi.ucsc.edu/index/projects?size=15&filters={"cellCount":{"is":[0]}}'  | jq '.hits[].cellSuspensions'
...
[
  {
    "organ": [
      "kidney"
    ],
    "organPart": [
      "cortex of kidney",
      "renal medulla",
      "renal pelvis",
      "ureter",
      null
    ],
    "selectedCellType": [
      "kidney cell"
    ],
    "totalCells": null
  }
]
...

There needs to be a distinction between:

hannes-ucsc commented 3 years ago

Great title and description. Ready for triage.

One thing to note is that cell counts are grouped by organ on at least on /index/projects. Not sure about the other /index/ endpoints.