Open larrybabb opened 7 months ago
Some hints....
>gcloud auth login --enable-gdrive-access
>gcloud config set project clingen-stage
if you want to use the autodetect
feature with a google sheet to generate an external table definition you can try with the following (see here)
>bq mkdef \
--autodetect \
--source_format=GOOGLE_SHEETS \
"https://docs.google.com/spreadsheets/d/1bADskBcobHTmmXungY09beWPDEa1nqM-PP__86yGVj0/edit\#gid\=2067576827" > report_submitters.def
but if you want to point to a range other than the first sheet you will need to open the file and add the range
>more report_submitters.def
{
"autodetect": true,
"sourceFormat": "GOOGLE_SHEETS",
"sourceUris": [
"https://docs.google.com/spreadsheets/d/1bADskBcobHTmmXungY09beWPDEa1nqM-PP__86yGVj0/edit\\#gid\\=2067576827"
],
"range": "'Report Submitter List'!A:G"
}
A word of caution...if using the autodetect: true
please reference the data type conversion and naming rules (headings or no headings) here.
bq mk \
--external_table_definition=report_submitters.def \
clinvar_curator.report_submitter
this will create a table called report_submitter
in the clingen-stage.clinvar_curator
dataset based on the google sheet link and range from the report_submitters.def
file definition.
I think there are 2 reasonable options
A precursor to getting all the artifacts related to the clinvar google sheets is to first define the bq client scripting steps needed to maintain and create external tables from google sheets.
For example in the ClinVar Report BQ Support Tables sheet, almost every tab is used to back and external table in the clingen-stage.clinvar_curator dataset. These tables are essential to the weekly processing of clinvar datatsets.
Once we have a good process for how to script, change, commit, run, then we should get all the external tables for this sheet into this github repo and versioned.
NOTE: we will also need the means to backup and restore the sheet itself so it does not get corrupted over time. This should be a separate but related ticket. They can be done independently, but the entire process makes the most sense when they are combined to work interchangeably (i.e. modify a column in the sheet or add a new tab/table, and auto run the google-sheet external table replacement/creation script).