DKMS / Hapl-o-Mat

A software for haplotype inference
Other
11 stars 7 forks source link

How to code ambiguities #3

Closed Rosadosa closed 3 years ago

Rosadosa commented 6 years ago

Dear DKMS,

I have been trying to use your hapl-o-mat program on exome sequencing data that has been processed with hlareporter. This program gives the predicted genotype per individual for the different HLA genes. However sometimes multiple options are possible for a locus, and I was wondering how to deal with this in the input files. I could not find very clearly in the tutorial how to deal with the coding of these alleles. I now have them coded in a GLSC file as the following: only 2 options: DRB103:01:01G+DRB103:01:01G more than 2 options: DRB103:01:01G/DRB113:27/DRB103:07/DRB113:01:01G However, all people with more than 2 options keep getting removed from my analysis, how do I include them?

Additionally I would like to include DRB3/4/5 in my analysis, however the presence of these genes is dependent on the DRB1 haplotype, and thus some people will not carry this gene in their haplotype. I have tried to code those as 0 in my GLSC file, but the program then removes everyone, how do keep them in? I could not find anything for encoding missing data in the GLSC file in the tutorial.

Thank you in advance. Best, Roos

sauter commented 3 years ago

This raises two issues.

The first refers to the GL string format. Please note, that the "+" operator denotes different copies of a gene. Hence, "DRB103:01:01G+DRB103:01:01G" denotes a genotype with a "DRB103:01:01G" allele on both chromosomes. For HLA A, B, C, DRB1, DQB1, DPB1, each chromosome carries one copy of the gene, so the genotype has two. This is not an ambiguity. The "/" operator denotes allelic ambiguity; hence "DRB103:01:01G/DRB113:27/DRB103:07" describes a situation where -for one chromosome, there is either "DRB103:01:01G" or "DRB113:27" or "DRB103:07". So If you have, for example, one gene typed without ambiguity and one with that could be denoted as "DRB103:01:01G+DRB103:01:01G/DRB113:27/DRB103:07". Or, if you have same ambiguity on both chromosomes, this could be written as "DRB103:01:01G/DRB113:27/DRB103:07+DRB103:01:01G/DRB113:27/DRB1*03:07".

The second issue refers to DRB3/4/5. As you state, not each haplotype will carry one of these genes. There is no default way to deal with this in Hapl-o-Mat. However, you could maybe introduce a place-holder-allele that encodes absence of a gene. This would be experimental.

Rosadosa commented 3 years ago

Thanks for your reply, but I have finished my PhD on this subject 2 years ago ;) I will close the issue!