DiltheyLab / HLA-LA

Fast HLA type inference from whole-genome data
GNU General Public License v3.0
120 stars 40 forks source link

Discordant results from the same person. #37

Open kimjr89 opened 4 years ago

kimjr89 commented 4 years ago

Hi.

I've got discordant results from different tissues of the same person using HLA-LA. The results are all the same except in DPA1 region: one is DPA102:02:02 and the other is DPA101:11. Though average coverage is lowest in DPA1 region than other regions, it's above 30X.

I attached the result files. AA1_P3_R1_bestguess_G.txt AA1_BM_R1_bestguess_G.txt

I investigated the intermediate files and found that there are summary stats prioritizing likely pairs in "R1_pileup_DPA1.txt" file.

ClusterID P LL Mismatches_avg DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:02:02 1 -97.8373 143 DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:05 1.88852e-21 -145.556 151 DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:02:06 4.64625e-24 -151.563 151.5 DPA101:03:01:01;DPA101:03:01:02;DPA101:03:01:03;DPA101:03:01:04;DPA101:03:01:05/DPA102:04 4.61435e-25 -153.873 152 DPA101:03:04/DPA102:02:02 1.73353e-32 -170.97 155.5 DPA101:12/DPA102:02:02 7.28223e-38 -183.35 153 DPA101:11/DPA102:02:02 1.9045e-42 -193.902 141 DPA101:10/DPA102:02:02 4.47384e-52 -216.073 159 DPA101:03:04/DPA102:05 3.27271e-53 -218.689 163.5 DPA101:03:04/DPA102:02:06 8.0459e-56 -224.697 164 DPA101:03:04/DPA102:04 7.99056e-57 -227.006 164.5 ......

I also attached "R1_pileup_DPA1.txt" files. AA1_BM_R1_PP_DPA1_pairs.txt AA1_P3_R1_PP_DPA1_pairs.txt

Can you explain the meaning of the column names (ClusterID, P, LL, and Mismatches_avg) and possible answers for the discordant results?

Thank you!

AlexanderDilthey commented 3 years ago

Hi @kimjr89,

Interesting result - I have no immediate explanation for this.

This is DNA sequencing data from normal tissue, i.e. e.g. not from tumor tissue?

Regarding the column names: ClusterID = Diploid G group allele cluster P = normalized posterior probability LL = log likelihood (which P is calculated from) Mismatches_avg = A measure of the read mismatches observed for the allele group cluster. Smaller is better in principle, but unclear whether this metric is really informative.