10XGenomics / scHLAcount

Count HLA alleles in single-cell RNA-seq data
MIT License
58 stars 12 forks source link

two alleles of HLA #18

Open xiweiwu opened 2 years ago

xiweiwu commented 2 years ago

What is the best way to obtain counts of two alleles of each HLA subtype? I tried to include both alleles, but the resulting matrix contains some extra rows (using -f and -g option).

For example, this is my genotyping file A23:01:01:01 A68:01:01:01 B15:10:01:01 B44:03:01:01 C03:03:01:01 C03:04:01:01 DPA101:03:01:01 DPA101:03:01:01 DPB118:01:01:01 DPB185:01:01:01 DQA101:02:01:01 DQA101:02:01:01 DQB102:01:01:01 DQB106:02:01:01 DRB103:01:01:01 DRB115:03:01:01

Here is the label file in the output folder: A23:01:01:01 A68:01:01:01 A B15:10:01:01 B44:03:01:01 B C03:03:01:01 C03:04:01:01 C DPA101:03:01:01 DPB118:01:01:01 DPB185:01:01:01 DPB1 DQA101:02:01:01 DQB102:01:01:01 DQB106:02:01:01 DQB1 DRB103:01:01:01 DRB115:03:01:01 DRB1

This causes the output matrix contains the extra rows, also with counts. I am wondering whether the counts are correct.

jactearle commented 1 year ago

I am also having this issue! Please let me know if you get to the bottom of it. Cheers

cdarby commented 3 months ago

Hello @jactearle and @xiweiwu Could you explain a bit more about which lines are repeated? Your genotyping file has a homozygous genotype for DPA1, and DQA1, so there is one line for each in the output. There appear to be two lines for all the other genes.