broadinstitute / gdctools

Python and UNIX CLI utilities to simplify interaction with the NIH/NCI Genomics Data Commons
Other
31 stars 4 forks source link

gdac_loadfile allowed FFPE samples into a -TP sampleset #65

Closed gsaksena closed 6 years ago

gsaksena commented 6 years ago

In the BLCA corhort, there are three individuals that have FFPE samples:

Methylation: fresh-frozen: TCGA-BL-A0C8-01A-11D-A276-05 TCGA-BL-A13I-01A-11D-A276-05 TCGA-BL-A13J-01A-11D-A276-05
FFPE: TCGA-BL-A0C8-01B-04D-A276-05 TCGA-BL-A13I-01B-04D-A276-05 TCGA-BL-A13J-01B-04D-A276-05

Expression fresh-frozen: TCGA.BL.A0C8.01A.11R.A277.07 TCGA.BL.A13I.01A.11R.A277.07 TCGA.BL.A13J.01A.11R.A277.07 FFPE: TCGA.BL.A0C8.01B.04R.A277.07 TCGA.BL.A13I.01B.04R.A277.07 TCGA.BL.A13J.01B.04R.A277.07

The loadfile for the TCGA-BLCA-TP sampleset should not contain the FFPE samples, but it does. This causes problems downstream, as the algorithms do not check for or correctly handle more than one sample from the same participant... they trust that gdc_loadfile has de-duplicated the participants upstream.

noblem commented 6 years ago

David, can you look into this tomorrow (Wed, 10/11)

dheiman commented 6 years ago

I think I found what happened - adding TCGA- to all the sample IDs broke FFPE detection in loadfile generation, should be an easy fix to test.

dheiman commented 6 years ago

Closed via #66.

noblem commented 6 years ago

David, I appreciate that the work is done but I'd like to keep this open until I regenerate the production (ish) loadfiles on the network file system

noblem commented 6 years ago

This appears to have been corrected, in both the legacy Firehose- and FireCloud/Google-style loadfiles, to wit:


pwd
/xchip/gdac_data/gdc/loadfiles/TCGA

/gdc/loadfiles/TCGA(vgdac1) grep FFPE 2017_09_19/TCGA-BLCA.Sample_Set.loadfile.txt 

TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A13J-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A13J-TP

/gdc/loadfiles/TCGA(vgdac1) grep FFPE 2017_10_17/TCGA-BLCA.Sample_Set.loadfile.txt

TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A13J-TP

/gdc/loadfiles/TCGA(vgdac1) cd ../google/TCGA/

/loadfiles/google/TCGA(vgdac1) grep FFPE 2017_09_19/TCGA-BLCA.Sample_Set.loadfile.txt 
TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-TP    TCGA-BLCAFFPE-BL-A13J-TP
TCGA-BLCA       TCGA-BLCAFFPE-BL-A13J-TP

/loadfiles/google/TCGA(vgdac1) grep FFPE 2017_10_17/TCGA-BLCA.Sample_Set.loadfile.txt 

TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A0C8-TP
TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A13I-TP
TCGA-BLCA-FFPE  TCGA-BLCAFFPE-BL-A13J-TP