DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
6 stars 2 forks source link

Inconsistent `null` count in tissue atlas term facet #6458

Closed nadove-ucsc closed 1 week ago

nadove-ucsc commented 1 month ago

The count for the null entry matches the total count, even though there are 8,064 non-null entries. We don't expect to observe null and non-null values in the same project, so this looks like a problem with how the null count is calculated.

$ curl 'https://service.azul.data.humancellatlas.org/index/files?catalog=dcp39' | jq '.termFacets.tissueAtlas'
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  553k  100  553k    0     0  63921      0  0:00:08  0:00:08 --:--:--  127k
{
  "terms": [
    {
      "term": "Lung",
      "count": 6264
    },
    {
      "term": "Retina",
      "count": 1050
    },
    {
      "term": "Blood",
      "count": 323
    },
    {
      "term": "Kidney",
      "count": 237
    },
    {
      "term": "Gut",
      "count": 190
    },
    {
      "term": null,
      "count": 520065
    }
  ],
  "total": 520065,
  "type": "terms"
}
hannes-ucsc commented 1 month ago

For demo, attempt to reproduce in prod.

nadove-ucsc commented 1 week ago

Demo revealed additional problems: https://github.com/DataBiosphere/azul/issues/6541