DataBiosphere / azul

Metadata indexer and query service used for AnVIL, HCA, LungMAP, and CGP
Apache License 2.0
7 stars 2 forks source link

No replicas for donors in HCA #6582

Open nadove-ucsc opened 2 months ago

nadove-ucsc commented 2 months ago

Spike to:

  1. Verify the titular claim
  2. Draft a solution
nadove-ucsc commented 2 months ago

The claim is correct. There are no donor replicas on dev:

GET /azul_v2_dev_dcp3_replica/_search
{
  "aggs": {
    "replica_type": {
      "terms": {
        "field": "replica_type.keyword"
      }
    }
  }
}

"aggregations" : {
    "replica_type" : {
      "doc_count_error_upper_bound" : 0,
      "sum_other_doc_count" : 0,
      "buckets" : [
        {
          "key" : "sequence_file",
          "doc_count" : 114546
        },
        {
          "key" : "cell_suspension",
          "doc_count" : 62175
        },
        {
          "key" : "links",
          "doc_count" : 57011
        },
        {
          "key" : "specimen_from_organism",
          "doc_count" : 7231
        },
        {
          "key" : "analysis_file",
          "doc_count" : 1661
        },
        {
          "key" : "supplementary_file",
          "doc_count" : 185
        },
        {
          "key" : "project",
          "doc_count" : 105
        },
        {
          "key" : "organoid",
          "doc_count" : 94
        },
        {
          "key" : "cell_line",
          "doc_count" : 45
        }
      ]
    }
  }
}
nadove-ucsc commented 2 months ago

Donors are not the only affected entities: processes and protocols are missing as well. Here are the results of indexing the canned bundle aaa96233-bf27-44c7-82df-b4dc15ad4d9d:

Old replicas: {'sequence_file': 1, 'specimen_from_organism': 1, 'project': 1, 'cell_suspension': 1, 'links': 1} New replicas: {'sequence_file': 1, 'specimen_from_organism': 1, 'project': 1, 'process': 1, 'donor_organism': 1, 'library_preparation_protocol': 1, 'cell_suspension': 1, 'sequencing_protocol': 1, 'links': 1}

hannes-ucsc commented 2 months ago

For demo, attempt to reproduce.

hannes-ucsc commented 4 weeks ago

I'd like this to be demoed in prod but replicas had to be disabled there due to #6648. We need to fix that first before we can demo this.