DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

Allele counts #59

Closed davetang closed 2 years ago

davetang commented 2 years ago

Hi Chris,

I hope you are well. The final report produced by HISAT-genotype lists the top 10 alleles and their counts, and the typing result. For example here's the typing result on HLA-A using NA12892.extracted.1.fq.gz and NA12892.extracted.2.fq.gz from ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat-genotype/data/hla/ILMN.tar.gz.

                        1496 reads and 769 pairs are aligned
                                1 A*02:01:01:01 (count: 419)
                                2 A*02:251 (count: 407)
                                3 A*02:610:02 (count: 405)
                                4 A*02:524:02 (count: 403)
                                5 A*02:650 (count: 403)
                                6 A*02:01:123 (count: 402)
                                7 A*02:647 (count: 401)
                                8 A*02:01:126 (count: 400)
                                9 A*02:562 (count: 400)
                                10 A*02:645 (count: 400)

                                1 ranked A*02:01:01:01 (abundance: 51.95%)
                                2 ranked A*11:01:01:01 (abundance: 48.05%)

Firstly, does the count indicate one concordant paired-end read mapping to a particular HLA allele (I'm also assuming that this isn't unique counts)? If that's the case, I wanted to know whether it is possible to generate counts for all alleles included in the reference database. If that is not possible, may you let me know how the abundance is calculated, so I can try to work out the raw count for A*11:01:01:01, which isn't in the top 10. (I tried taking the count for A*02:01:01:01 and dividing by the total pairs but that produces a slighter higher percentage 54.49%.)

I noticed that there is the --keep-alignment parameter, which contains the mapping results to the HLA backbone but it wasn't obvious how I could generate the allele counts.

Many thanks! Dave

chbe-helix commented 2 years ago

Hi Dave,

We've been well here, thanks! Hope you have too.

Yes, the counts are read pairs aligning to the allele. There is no way to back calculate the counts from the abundance.

I will happily add an option to output all counts to a file if you'd like. If this would help let me know and I'll get to work on it right away and can have something to you by end of the week at latest.

Thanks! -Chris

davetang commented 2 years ago

Hi Chris,

we're are doing well too and glad to hear you're well!

That option is perfect and would really be helpful; thank you in advance!

Cheers, Dave

chbe-helix commented 2 years ago

Hi Dave,

Sorry for the delay in getting the update pushed. Version 1.3.3 of HISATgenotype is available to use with new option --output-allele-counts that fulfill your request to have counts for all alleles for each gene reported. Note this can produce a large results file. Thanks for your patience!

Thanks, Chris

davetang commented 2 years ago

Hi Chris,

thank you so much! I'm trying to start from a fresh installation of HISAT-genotype but it seems that I am unable to connect to the FTP site (ftp://ftp.ccb.jhu.edu/pub/infphilo/hisat-genotype/data/genotype_genome_20180128.tar.gz).

A while back, I was trying to generate my own genotype genome reference but I think I saw that (in a GitHub issues) it is recommended to use the provided reference for the sake of compatibility.

Are there any plans on hosting the reference files on another platform? If not, do you know when the FTP site will be accessible again?

Many thanks! Dave

chbe-helix commented 2 years ago

Hi Dave,

It looks like there is a recurring problem with the FTP. I will host an alternative repository for each of the references here: https://github.com/chbe-helix/hisatgenotype-ref

Use git-lfs (Large file storage) to access the large references. Hope this helps!

Thanks, Chris

davetang commented 2 years ago

Hi Chris,

I could set up version 1.3.3 and see all the allele counts! Thank you so much!

Cheers, Dave