DaehwanKimLab / hisat2

Graph-based alignment (Hierarchical Graph FM index)
GNU General Public License v3.0
464 stars 113 forks source link

HISAT-genotype model for HLA allele abundance estimates #266

Open rtobes opened 3 years ago

rtobes commented 3 years ago

In your last paper about HISAT-genotype (https://pubmed.ncbi.nlm.nih.gov/31375807/) you described the algorithm of assignment of HLA allele abundance based on a strategy of Expectation Maximization:

"HISAT-genotype applies the following statistical model in each of the two steps to find maximum likelihood estimates of abundance through an EM algorithm (33). We previously implemented an EM solution in our centrifuge system (34), and we used a similar algorithm in HISAT-genotype, with modifications to the variable definitions as follows........"

Analyzing the results that HISAT-genotype provides for each HLA-gene it appears that your model considers that it is possible to have any number of alleles per gene following the same model that you applied in Centrifuge system that was a program designed for metagenomics taxonomic abundance profile analysis. In metagenomics you can find any number of different organisms in the sample but in HLA you must define a model in which only 2 alleles per gene are possible.

Please, could you explain me how do you have modeled in HISAT-genotype the restriction to a maximum of 2 alleles when you estimate HLA alleles abundance?

rtobes commented 3 years ago

I am closing this issue because I think that it is better to include it here: https://github.com/DaehwanKimLab/hisat-genotype/issues/28