clingen-data-model / clinvar-streams

1 stars 0 forks source link

Create copy of snapshot views in central bigquery project when release message is received #37

Closed theferrit32 closed 2 years ago

theferrit32 commented 2 years ago

A recent change in the jade ingest pipeline creates a project per snapshot, with randomly generated project names, which complicates the process of writing bigquery scripts. The project+dataset name is queryable from the jade-terra API (https://jade-terra.datarepo-prod.broadinstitute.org/api). We can then enumerate the views and create a "copy" view in a single central project with the date as the dataset name, such as the broad+terra+cgen project we have been using.

This will be done in clinvar-raw upon the receipt of new dataset releases from the broad-dsp-clinvar topic. Should think about whether there's a problem with running clinvar-raw in multiple gke clusters if this is done in each. Really we just want it to happen in prod. Can make this functionality behind an environment variable toggle.

View creation example:

create or replace view `clingen-yyyyyy.clinvar_2021_07_10_v1_3_9.clinical_assertion` as
select * from `snapshot-xxxxxx.clinvar_2021_07_10_v1_3_9.clinical_assertion`

Jade API docs: https://jade-terra.datarepo-prod.broadinstitute.org/swagger-ui.html#/snapshots

Jade snapshot listing:

curl -H "Authorization: Bearer $GOOGLE_TOKEN" "https://jade-terra.datarepo-prod.broadinstitute.org/api/repository/v1/snapshots?limit=1000"

Google OAuth2 token: (java) https://googleapis.dev/java/google-oauth-client/1.31.5/com/google/api/client/auth/oauth2/Credential.html?is-external=true#getAccessToken-- (sh) gcloud auth print-access-token