DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

How does HISAT-genotype predict its initial alleles in RNA Seq-data? #70

Open PhilippMueller24 opened 2 years ago

PhilippMueller24 commented 2 years ago

Hello everyone,

first of all, thank you for this amazing package! It is really fast and convenient to use. I was going through my output of RNA Seq-data and stumbled across some results I am having difficulties wrapping my head around. This is one example output:

 1 ranked B*44:02:27      (abundance: 21.38%)
 2 ranked B*44:19N.       (abundance: 21.38%)
 3 ranked B*44:02:01:03 (abundance: 21.38%)
 4 ranked B*07:02:01.     (abundance: 8.97%)
 5 ranked B*07:02:45.     (abundance: 8.97%)
 6 ranked B*07:161N.     (abundance: 8.97%)
 7 ranked B*07:61           (abundance: 8.97%)

The first question I have is how a NULL-allele can be present in RNA data? From my understanding, with the expectation of having a frameshift at the very end of the gene, it should not be able to be processed into mature RNA.

2nd, some of the differences that distinguish some of the different ranks occur only in regions that are outside of Exons. How does HISAT-genotype "finds" these differences on the basis of RNA-Seq data?

Any clarification is highly appreciated! Thank you! Best, Philipp