fpbarthel / GLASS

GLASS consortium
MIT License
37 stars 13 forks source link

Mismatches #102

Closed fpbarthel closed 5 years ago

fpbarthel commented 5 years ago

Ongoing effort to identify mismatches.

To look:

cd /projects/verhaak-lab/GLASS-analysis/results/fingerprinting/case
cat * | grep -w "EXPECTED_MISMATCH" | less -S

Note that not all mismatches are necessarily true mismatches. They could be due to quality issues.

Kcjohnson commented 5 years ago

The full GLASS.crosscheck_metrics represents an all-against-all comparisons and we now have a better understanding about whether the identified mismatches might belong to other samples in the GLASS study.

By identifying unexpected mismatches, we identified several samples in the MD Anderson cohort that have strong evidence for mismatches. Several of these samples did not match any other sample in the GLASS cohort, while others appear to have been mislabelled at some point. Note that GLSS-MD-LP03-R1 was a very low-quality sample that may not actually be mislabelled.

Unmatched samples

aliquot_barcode aliquot_analysis_type aliquot_batch unmatched_sample_barcode
GLSS-MD-0011-R1-01D-WGS-FUC4DM WGS GLSS-MD-WGS GLSS-MD-0011-R1
GLSS-MD-0016-R1-01D-WGS-P52RM6 WGS GLSS-MD-WGS GLSS-MD-0016-R1
GLSS-MD-LP03-R1-01D-WGS-4NCFXD WGS GLSS-LP-WGS GLSS-MD-LP03-R1

Separately, there were samples in the MD Anderson cohort that unexpectedly mapped to a different case (also in the MD Anderson studies).

Mismatched samples Note: samples repeat for (-0095- and -0027-)

aliquot_barcode sample_mismatched matching_case
GLSS-MD-0002-TP-01D-WGS-XLZQCF GLSS-MD-0002-TP GLSS-MD-0006
GLSS-MD-0019-NB-01D-WGS-CBD01V GLSS-MD-0019-NB GLSS-MD-0041
GLSS-MD-0027-NB-01D-WGS-RVOO5S GLSS-MD-0027-NB GLSS-MD-0095
GLSS-MD-0027-R1-01D-WGS-88ST4Y GLSS-MD-0027-R1 GLSS-MD-0095
GLSS-MD-0027-R2-01D-WGS-55XI5C GLSS-MD-0027-R2 GLSS-MD-0095
GLSS-MD-0027-TP-01D-WGS-6HFKNZ GLSS-MD-0027-TP GLSS-MD-0095
GLSS-MD-0084-NB-01D-WXS-6X5GYR GLSS-MD-0084-NB GLSS-MD-0085
GLSS-MD-0085-NB-01D-WXS-W41LMF GLSS-MD-0085-NB GLSS-MD-0084
GLSS-MD-0095-NB-01D-WXS-7SK6Q5 GLSS-MD-0095-NB GLSS-MD-0027
GLSS-MD-0095-R1-01D-WXS-HX8R83 GLSS-MD-0095-R1 GLSS-MD-0027
GLSS-MD-0095-TP-01D-WXS-OAQEYZ GLSS-MD-0095-TP GLSS-MD-0027

Recommendations

  1. It seems reasonable that we exclude the cases for the unmatched samples above because the remaining high-quality data simply belongs to the NB and TP samples.

  2. For those cases where the samples matched to a different case we can inquire with our colleagues to identify whether the clinical information for this sample can be corrected (i.e., GLSS-MD-0002-TP).

  3. For the case where the NB samples are matching, but not the tumors (i.e., -0019- and -0041-) it would make sense to exclude case GLSS-MD-0019 from further analyses. Using the code in the comment above, it is clear that GLSS-MD-0019-NB does not match its named tumor samples (while -0041- does not have a case mismatch).

  4. GLSS-MD-0084-NB and GLSS-MD-0085-NB match each other. However, GLSS-MD-0084-NB doesn't match its -TP- or -R1- while GLSS-MD-0085-NB doesn't match its -TP-. Due to the confusion and limited utility of only a single -R1- sample we should probably exclude both cases.

  5. GLSS-MD-0027 and GLSS-MD-0095 represent the same sample and these cases do not have a mismatch within a sample (i.e., normals match tumors). Inspecting the original clinical sheets revealed an error where the IDs contained typos, but were very similar. It is advised to keep these samples because they offer complementary information of WGS and WXS.

fpbarthel commented 5 years ago

Closed related issue https://github.com/TheJacksonLaboratory/GLASS/issues/15

fpbarthel commented 5 years ago

@Kcjohnson can you close this issue when we have used this data to create a blocklist/reviewlist based on these data?

I will NOT be performing any consolidation between GLSS-MD-0027 and GLSS-MD-0095 at this time. Please select the highest quality patient (with the highest coverage and least samples on any mutation/cnv blocklists/graylist) for use and add the other one to fingerprinting blocklist

Kcjohnson commented 5 years ago

We are choosing to block GLSS-MD-0095 at the patient-level because the WGS data for GLSS-MD-0027" has profiled more samples (TP,R1, andR2`) and the data quality looks pretty good.

Kcjohnson commented 5 years ago

Here is the sheet for "block", "review", and "allow" samples. I have attempted to provide a dictionary of each of the exclusion reasons as well. I have moved the dictionary for the exclusion criteria to #104. block_review_allow_quality_lists_20181206.txt

I imagine that some samples will either be promoted or demoted to different categories as more in-depth analyses take place. Any further changes will be updated in the database.