Interpreting the output of `conc-results`

DaehwanKimLab / hisat-genotype

GNU General Public License v3.0

25 stars 15 forks source link

Interpreting the output of `conc-results` #19

Closed tinyheero closed 4 years ago

tinyheero commented 4 years ago

Hi there,

I wanted to ask how does one interpret the results from the conc-results script? I get these columns:

File
EM: A
EM: B
EM: C
EM: DPA1
EM: DPB1
EM: DRB1
EM: DQA1
EM: DQB1
Allele splitting: A
Allele splitting: B
Allele splitting: C
Allele splitting: DPA1
Allele splitting: DPB1
Allele splitting: DRB1
Allele splitting: DQA1
Allele splitting: DQB1

What is the difference between the EM and Allele splitting columns? I am specifically interested in just getting the two most likely alleles. Based on https://daehwankimlab.github.io/hisat-genotype/tutorials/#typing-output, is it correct to assume that I should be using the information from the EM column for this?

chbe-helix commented 4 years ago

Hi Fong,

It depends on what your results look like and what fields you want of the star alleles (ie 01:01:01 of 01:01:01:02 etc). The EM is the raw result of HISATgenotype's EM algorithm. The Allele Splitting is a new experimental method to allow users to clean up some results and get the fields they are after.

For example: HLA-A results from EM: A01:01:05:01 (25%) A01:01:05:04 (25%) A02:01:01:01 (25%) A02:01:01:04 (25%)

Will collapse to: A01:01:05 (50%) A02:01:01 (50%) If one does not need the 4th field.

I'm working on improving this method for the next release and documenting the functionality. Hope this helps explain what is happening!

Thanks, Chris

tinyheero commented 4 years ago

Oh I see Chris. In my situation, 4 digit resolution is what I need and what most neoantigen binding affinity tools require. Is there a way to get it to collapse/summarize to 4 digits?

If I understand correctly, it should collapse to:

A01:01 (50%)
A02:01 (50%)

chbe-helix commented 4 years ago

Hi Fong,

You are correct! I have a version that will collapse to a set of user defined digits in the next release. At this time however it returns an optimized list that is sorted by the percentage and number of digits being reported. You are welcome to use either reported value in the current version you see fit. I hope to have the new conc-results released soon.

Thanks, Chris