Closed noblem closed 6 years ago
David, I gave this to you bc I vaguely recall you having spelunked in this part of the code recently. But if you're not comfortable with doing this then I will.
I heavily refactored this part of the code (consolidated all counting and count file generation to the dicer, and fixed bugs in aggregate counting). I'm about to test my update.
Presently the sample counts files such as
/xchip/gdac_data/gdc/dice/TCGA/metadata/sample_counts.2017_08_09.tsv
do not indicate from which project they originate. So it would be helpful to have the cohort names (first column) in each of these reflect the project, too (e.g. COAD-TP becomes TCGA-COAD-TP), because this is how they are loaded into our workspaces and operated upon by our tasks, and reported in our dashboards, etc (not to mention that it globally disambiguates the cohort from all others)