clingen-data-model / clinvar-ingest

Apache License 2.0
1 stars 0 forks source link

Create a single BQ Dataset for processing history #201

Closed toneillbroad closed 1 month ago

toneillbroad commented 1 month ago

Create a single BQ Dataset to contain the VCV/RCV workflow processing_history table.

This table is used to consolidate both the VCV and RCV processing into a single BQ dataset according to the specification detailed here: https://docs.google.com/spreadsheets/d/1Ny3I8Eg_cwTalWWG1QryGoaLPpFg7ScqIyG_r86wF_Y/edit?gid=0#gid=0

Other fields to consider in the processing_history table:

theferrit32 commented 1 month ago

Here's the current processing_history schema which I originally just scraped from the DSP tables.

https://github.com/clingen-data-model/clinvar-ingest/blob/v1_1_0_beta3/clinvar_ingest/cloud/bigquery/bq_json_schemas/processing_history.bq.json

[
  {
    "name": "release_date",
    "type": "DATE"
  },
  {
    "name": "processing_date",
    "type": "DATE"
  },
  {
    "name": "pipeline_version",
    "type": "STRING"
  }
]

Which reminded me we do have an optional environment variable CLINVAR_INGEST_RELEASE_TAG which is set on the cloud run job instance by deploy-job.sh, and which is read by config.py. We could use this value in order to add a column analogous to pipeline_version in the DSP tables.