Matt requested some changes to make the experience smoother for AMP PD researchers.
Don't export samples and sample sets
samples are confusing -- why are we exporting all samples when I selected a cohort? sample sets are confusing -- why are cohorts not materialized (a SQL query) but samples sets are?
We materialized sample sets so that users can run workflows in Saturn. Initially, there will not be an emphasis on running workflows for AMP PD. Later on, if AMP PD researchers want to run workflows, we can revisit. We could add a checkbox to the cohort name dialog like "Export to a sample set for running workflows?"
Continue to export BigQuery tables. Shorten BigQuery_table_id
The long-term plan for cohorts is: A cohort represents a table with specific rows (WHERE clauses) and columns (FROM clauses). The SQL query returns these rows and columns. A cohort is associated with a set of BigQuery tables in the FROM clauses. For example, a cohort might contain only columns from a Demographics table.
For now, a cohort is simply a set of participant ids -- WHERE clauses with no FROM clauses. The sql query returns only a list of participants, with no other columns. Users will join the "set of participant ids" with whatever table/columns they're interested in, in a notebook. The Bigquery table entities lists the available tables to join against.
Change BigQuery_table_id from verily-public-data_human_genome_variants_1000_genomes_participant_info to 1000_genomes_participant_info.
But that wasn't possible because entity names can't have .. (I believe with the new entity service, this will be possible.) Entity attributes can have ., so the table_name attribute has the correct qualified BigQuery table name.
Add dataset_name column to BigQuery table and cohort entities
In the future, one could export multiple datasets into the same workspace. It would be nice to have a dataset_name column so for example, one knew what are the AMP PD BigQuery tables for joining with an AMP PD cohort. Let's use dataset name from dataset.json.
Always show cohort name dialog
Current: we only show cohort name dialog is a cohort was selected.
New: If no cohort is selected, show cohort name dialog with name "all participants". User can edit this name if they want. This cohort's SQL query returns all participants.
Matt requested some changes to make the experience smoother for AMP PD researchers.
Don't export samples and sample sets
samples are confusing -- why are we exporting all samples when I selected a cohort? sample sets are confusing -- why are cohorts not materialized (a SQL query) but samples sets are?
We materialized sample sets so that users can run workflows in Saturn. Initially, there will not be an emphasis on running workflows for AMP PD. Later on, if AMP PD researchers want to run workflows, we can revisit. We could add a checkbox to the cohort name dialog like "Export to a sample set for running workflows?"
Continue to export BigQuery tables. Shorten BigQuery_table_id
The long-term plan for cohorts is: A cohort represents a table with specific rows (WHERE clauses) and columns (FROM clauses). The SQL query returns these rows and columns. A cohort is associated with a set of BigQuery tables in the FROM clauses. For example, a cohort might contain only columns from a Demographics table.
For now, a cohort is simply a set of participant ids -- WHERE clauses with no FROM clauses. The sql query returns only a list of participants, with no other columns. Users will join the "set of participant ids" with whatever table/columns they're interested in, in a notebook. The Bigquery table entities lists the available tables to join against.
Change BigQuery_table_id from
verily-public-data_human_genome_variants_1000_genomes_participant_info
to1000_genomes_participant_info
.Background: Currently we export:
The two columns are redundant. Ideally there would just be one column:
But that wasn't possible because entity names can't have
.
. (I believe with the new entity service, this will be possible.) Entity attributes can have.
, so the table_name attribute has the correct qualified BigQuery table name.Matt suggested this which looks cleaner:
Add dataset_name column to BigQuery table and cohort entities
In the future, one could export multiple datasets into the same workspace. It would be nice to have a dataset_name column so for example, one knew what are the AMP PD BigQuery tables for joining with an AMP PD cohort. Let's use dataset name from dataset.json.
Always show cohort name dialog
Current: we only show cohort name dialog is a cohort was selected.
New: If no cohort is selected, show cohort name dialog with name "all participants". User can edit this name if they want. This cohort's SQL query returns all participants.