Open noblem opened 6 years ago
For completeness, the other 2 UCEC samples which exhibited this issue are given below:
cut -f1,7 loadfiles/google/CPTAC3/latest/CPTAC3.Sample.loadfile.txt | grep NULL
CPTAC3-UCEC-C3L-00084-TP gs://broad-institute-gdac/GDAC_FC_NULL
CPTAC3-UCEC-C3L-00930-TP gs://broad-institute-gdac/GDAC_FC_NULL
CPTAC3-UCEC-C3L-01284-TP gs://broad-institute-gdac/GDAC_FC_NULL
Today in exploring UCEC CPTAC3 genomic data I noticed that the GDCtools-generated sample reports listed fewer clinical samples than the total number of samples which had at least 1 annotation in the loadfiles (177 clinical samples, 180 lines in loadfile). This led me to discover that some samples had molecular data but no clinical data: for example, the patient case C3L-00084 in UCEC was "stopped" (in CPTAC terminology), which AFAIK is equivalent to being redacted, but the molecular data were not removed from the DCC.
GDCtools can easily flag this situation and raise awareness