Closed gsaksena closed 6 years ago
David, can you look into this tomorrow (Wed, 10/11)
I think I found what happened - adding TCGA-
to all the sample IDs broke FFPE detection in loadfile generation, should be an easy fix to test.
Closed via #66.
David, I appreciate that the work is done but I'd like to keep this open until I regenerate the production (ish) loadfiles on the network file system
This appears to have been corrected, in both the legacy Firehose- and FireCloud/Google-style loadfiles, to wit:
pwd
/xchip/gdac_data/gdc/loadfiles/TCGA
/gdc/loadfiles/TCGA(vgdac1) grep FFPE 2017_09_19/TCGA-BLCA.Sample_Set.loadfile.txt
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A13J-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A13J-TP
/gdc/loadfiles/TCGA(vgdac1) grep FFPE 2017_10_17/TCGA-BLCA.Sample_Set.loadfile.txt
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A13J-TP
/gdc/loadfiles/TCGA(vgdac1) cd ../google/TCGA/
/loadfiles/google/TCGA(vgdac1) grep FFPE 2017_09_19/TCGA-BLCA.Sample_Set.loadfile.txt
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-TP TCGA-BLCAFFPE-BL-A13J-TP
TCGA-BLCA TCGA-BLCAFFPE-BL-A13J-TP
/loadfiles/google/TCGA(vgdac1) grep FFPE 2017_10_17/TCGA-BLCA.Sample_Set.loadfile.txt
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-FFPE TCGA-BLCAFFPE-BL-A13J-TP
In the BLCA corhort, there are three individuals that have FFPE samples:
Methylation: fresh-frozen: TCGA-BL-A0C8-01A-11D-A276-05 TCGA-BL-A13I-01A-11D-A276-05 TCGA-BL-A13J-01A-11D-A276-05
FFPE: TCGA-BL-A0C8-01B-04D-A276-05 TCGA-BL-A13I-01B-04D-A276-05 TCGA-BL-A13J-01B-04D-A276-05
Expression fresh-frozen: TCGA.BL.A0C8.01A.11R.A277.07 TCGA.BL.A13I.01A.11R.A277.07 TCGA.BL.A13J.01A.11R.A277.07 FFPE: TCGA.BL.A0C8.01B.04R.A277.07 TCGA.BL.A13I.01B.04R.A277.07 TCGA.BL.A13J.01B.04R.A277.07
The loadfile for the TCGA-BLCA-TP sampleset should not contain the FFPE samples, but it does. This causes problems downstream, as the algorithms do not check for or correctly handle more than one sample from the same participant... they trust that gdc_loadfile has de-duplicated the participants upstream.