DaehwanKimLab / hisat-genotype

GNU General Public License v3.0
23 stars 15 forks source link

Trimming the final two fields the HLA-alleles #41

Closed Tijs-dot closed 3 years ago

Tijs-dot commented 3 years ago

Hi Chris,

I was wondering if there is a way to trim the fields that distinguish between different HLA calls for the same allele, for the abundance values when using the -t option from hisatgenotype_toolkit. I wanted to convert the .report output from HISAT-genotype into a csv format, while trimming the last two fields (by using -t 2), so I would be able to group all alleles that only differ from one another in the final two fields of the nomenclature. However, this only seems to trim the allele count fields, not the ones in the abundance part of the report output. So for example in the output below, the 5 alleles ranked at the bottom would remain untrimmed. Actually four of them are the HLA-A*02:01 allele, and I would like them to be counted together as that allele.

4392 reads and 2212 pairs are aligned 1 A02:01:01:01 (count: 1550) 2 A02:01:01:05 (count: 1550) 3 A02:01:01:06 (count: 1550) 4 A02:01:01:07 (count: 1550) 5 A02:01:01:08 (count: 1548) 6 A02:01:126 (count: 1530) 7 A02:648 (count: 1525) 8 A02:30:01 (count: 1523) 9 A02:562 (count: 1523) 10 A02:498 (count: 1518)

1 ranked A01:01:01:01 (abundance: 30.96%) 2 ranked A02:01:01:01 (abundance: 17.26%) 3 ranked A02:01:01:05 (abundance: 17.26%) 4 ranked A02:01:01:06 (abundance: 17.26%) 5 ranked A*02:01:01:07 (abundance: 17.26%)

Does this question make sense to you?

Thanks in advance, Tijs