Closed Kcjohnson closed 6 years ago
Perhaps the first five characters of flow cell are truly unique? See here
This post from broad uses first five
UPDATE: Confirmed, ~the starting letter and~ four characters following the first are unique. The first character and last four characters define the flow-cell type (eg. Novaseq S2). See here
Following up on a single point from the list above. Sample TCGA-06-0152
is greylisted in the PCAWG data set. From the PCAWG quality exclusion list:
Greylisted donors will also have consensus variant calls, however in principle these donors are recommended to be used by downstream analyses. Some specific type of analyses may choose to use some of the greylisted donors given that the known QC issue is unlikely affecting the analysis result and the researcher provides clear justification.
There are 10 remaining samples that overlap between PCAWG (2016 data freeze) and GLASS-WG. All of these remaining samples are whitelisted (good quality). 9 GBMs and 1 LGG.
As we have begun to preprocess and align, a need for revising some of the json fields has arisen. We intend to re-run all samples through the pre-processing and alignment pipelines while making the following changes to the json files:
Update 9/10/18: JSON update finished for TCGA, will be used as a model for other cohorts