broadinstitute / viral-pipelines

viral-ngs: complete pipelines
Other
60 stars 28 forks source link

percolate biosample attributes to sample table in Terra #532

Closed dpark01 closed 2 months ago

dpark01 commented 6 months ago

It's getting to be quite limiting that we can't easily access the biosample metadata from the sample or assembly tables in Terra using the current data model that our WDLs create. We should pursue one of the following solutions (or something like it)

  1. demux_deplete populates the sample table with columns from biosample_attributes_tsv
  2. demux_deplete populates the sample table with a json object containing all the biosample attributes from only the relevant row of the tsv corresponding to this sample
  3. demux_deplete emits a tsv output file that is a slightly transformed version of the biosample_attributes_tsv, the main difference being that it contains one more column that corresponds to the sample_id of the sample table (ie, the "sanitized" sample name with dashes and underscores and removing any slashes or spaces from the real/original sample name) -- currently the original biosample_attributes_tsv only has the unsanitized / external facing sample id. Then the user can simply use terra_tsv_to_table to update the sample table themselves (this would require updating terra_tsv_to_table to accept arbitrary columns as the index column by rewriting the column header with the requisite entity: stuff on the fly)