fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

Purge faulty samples from the dataset #122

Closed fpbarthel closed 5 years ago

fpbarthel commented 5 years ago

Several patients are consistently failing M2 calling. Of note, as you can see in the table below, these patients are also blocklisted to varying degrees.

------------------------------------------------------------------------------------------------------------
M2-error log                                 error       blocklist status
------------------------------------------------------------------------------------------------------------
logs/mutect2/filtermutect/GLSS-MD-LP01.log   log10p      coverage excl in WGS *SS*
logs/mutect2/filtermutect/GLSS-MD-LP02.log   log10p      coverage excl in whole genome *SS*
logs/mutect2/filtermutect/GLSS-MD-LP03.log   log10p      fingerprint excl, coverage exclusion in WGS + WXS
logs/mutect2/filtermutect/GLSS-MD-LP04.log   log10p      coverage excl in WGS *SS*
logs/mutect2/filtermutect/GLSS-MD-LP07.log   log10p      coverage excl in both WGS + WXS
logs/mutect2/filtermutect/GLSS-MD-LP08.log   log10p      coverage excl in WGS, no R in WXS
logs/mutect2/filtermutect/GLSS-MD-LP10.log   errorRate   coverage excl in WGS *SS*
logs/mutect2/filtermutect/GLSS-MD-0019.log   errorRate   fingerprint excl
logs/mutect2/filtermutect/GLSS-MD-0084.log   errorRate   fingerprint
logs/mutect2/filtermutect/GLSS-MD-0085.log   errorRate   fingerprint excl
logs/mutect2/filtermutect/GLSS-MD-0137.log   log10p      coverage excl in both WGS + WXS
------------------------------------------------------------------------------------------------------------
coverage excl = samples excluded due to low coverage
fingerprint excl = patient excluded entirely due to mismatches

For many of the above patients the quality is so poor that the data is not usuable. However, for 4/11 patients the patient may be salvageable by dropping some of the faulty data.

@roelverhaak please sign off on the to-do list below

To-Do:

We chose to leave the data for now and instead mark as deprecated so they will not be used in any analyses.

fpbarthel commented 5 years ago

Just to confirm in case you were wondering @roelverhaak, none of the 7 patients to be dropped were used in the paper, whereas the 4 to be kept were.

fpbarthel commented 5 years ago

Also LP03 failed CNV.

CNV-error log                                                             error          blocklist status
logs/cnv/denoisereadcounts/GLSS-MD-LP03-R1-01D-WGS-4NCFXD.allosomal.log   non-neg median   fingerprint excl
logs/cnv/denoisereadcounts/GLSS-MD-LP03-R1-01D-WXS-UWJG6Q.allosomal.log   non-neg median   fingerprint excl
logs/cnv/denoisereadcounts/GLSS-MD-LP03-R1-01D-WXS-UWJG6Q.autosomal.log   non-neg median   fingerprint excl
logs/cnv/denoisereadcounts/GLSS-MD-LP03-R1-01D-WGS-4NCFXD.autosomal.log   non-neg median   fingerprint excl
fpbarthel commented 5 years ago

Seven samples had all aliquots marked depricated

UPDATE analysis.files
SET file_format = 'aligned BAM (depricated)'
FROM (SELECT al.aliquot_barcode
    FROM biospecimen.samples s
    RIGHT JOIN biospecimen.aliquots al ON s.sample_barcode = al.sample_barcode
    RIGHT JOIN analysis.files f ON f.aliquot_barcode = al.aliquot_barcode
    WHERE case_barcode IN ('GLSS-MD-LP03', 'GLSS-MD-LP07', 'GLSS-MD-LP08', 'GLSS-MD-0019', 'GLSS-MD-0084', 'GLSS-MD-0085', 'GLSS-MD-0137') AND file_format = 'aligned BAM') subq
WHERE files.aliquot_barcode = subq.aliquot_barcode AND file_format = 'aligned BAM'

Four samples had WGS bams (but not WXS) marked depricated

UPDATE analysis.files
SET file_format = 'aligned BAM (depricated)'
FROM (SELECT al.aliquot_barcode, file_format
    FROM biospecimen.samples s
    RIGHT JOIN biospecimen.aliquots al ON s.sample_barcode = al.sample_barcode
    RIGHT JOIN analysis.files f ON f.aliquot_barcode = al.aliquot_barcode
    WHERE case_barcode IN ('GLSS-MD-LP01','GLSS-MD-LP02','GLSS-MD-LP04', 'GLSS-MD-LP10') AND file_format = 'aligned BAM' AND aliquot_analysis_type = 'WGS') subq
WHERE files.aliquot_barcode = subq.aliquot_barcode AND file_format = 'aligned BAM'

Update: made a spelling correction, depricated > deprecated

fpbarthel commented 5 years ago

Relevant aliquots marked as deprecated