fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

JSON revisions before preprocessing freeze #43

Closed Kcjohnson closed 6 years ago

Kcjohnson commented 6 years ago

As we have begun to preprocess and align, a need for revising some of the json fields has arisen. We intend to re-run all samples through the pre-processing and alignment pipelines while making the following changes to the json files:

Update 9/10/18: JSON update finished for TCGA, will be used as a model for other cohorts

fpbarthel commented 6 years ago

Perhaps the first five characters of flow cell are truly unique? See here

This post from broad uses first five

UPDATE: Confirmed, ~the starting letter and~ four characters following the first are unique. The first character and last four characters define the flow-cell type (eg. Novaseq S2). See here

Kcjohnson commented 6 years ago

Following up on a single point from the list above. Sample TCGA-06-0152 is greylisted in the PCAWG data set. From the PCAWG quality exclusion list:

Greylisted donors will also have consensus variant calls, however in principle these donors are recommended to be used by downstream analyses. Some specific type of analyses may choose to use some of the greylisted donors given that the known QC issue is unlikely affecting the analysis result and the researcher provides clear justification.

There are 10 remaining samples that overlap between PCAWG (2016 data freeze) and GLASS-WG. All of these remaining samples are whitelisted (good quality). 9 GBMs and 1 LGG.