0xTCG / aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
http://aldy.csail.mit.edu
Other
57 stars 20 forks source link

Genotype Interpretation and Possible bugs #45

Closed anh151 closed 1 year ago

anh151 commented 1 year ago

Hello, I have several questions about Aldy genotypes and a possible bug report. I am using the latest version of Aldy with the default ILP solver. I have WGS samples that I am using to call CYP2D6.

I have a few genotype results that I need help interpreting.

  1. 2/41+rs368858603 - what does the + rsID mean?
  2. 2/68:2 - what does the ":2" mean?
  3. 4+4/*4.021.ALDY - what does the ".ALDY" mean?
  4. 1+94/10 and 1+2/17 - For these last two my question is about the "1+other" haplotype. 1 is generally defined as the lack of a variant/allele. In these two scenarios, is Aldy saying that one of the suballeles within 1 was detected plus 94 on the same haplotype?

I have a few samples where the stdout call from Aldy is 5/5. However, the output .tsv file is blank. Why would this be the case? Example is shown below.

🐿  Aldy v4.3.1 (Python 3.9.15 on Linux 5.15.65+-x86_64-with-glibc2.31)
   (c) 2016-2022 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample 1004207.bam...
Potential CYP2D6 gene structures for 1004207:
   1:  (confidence: 100%)
Potential major CYP2D6 star-alleles for 1004207:
   1:  (confidence: 100%)
Best CYP2D6 star-alleles for 1004207:
   1: *5 / *5 (confidence=100%)
      Minor alleles: 
CYP2D6 results:
  - *5 / *5
    Minor: [*5] / [*5]
    Legacy notation: [*5] / [*5]
Preparing debug archive...

image

I can provide a debug report for anything shown here if needed.

inumanag commented 1 year ago

Hi @anh151

2/41+rs368858603 - what does the + rsID mean?

This means that you have 2 and 41 with an extra SNP rs368858603 that is not part of either 2 or 41 definition.

2/68:2 - what does the ":2" mean?

This is 68; :2 is just an internal distinction between two 68 variants present in the database.

4+4/*4.021.ALDY - what does the ".ALDY" mean?

This means that you have 4.021 augmented with some extra SNPs (or, in some cases, with some SNPs removed). Such alleles were observed in some of our samples (see `aldy query "CYP2D64.021.ALDY"andaldy query "CYP2D6*4.021"for the full list of differences). In general, all alleles ending with.ALDY` are specific to Aldy and are not part of the PharmVar.

1+94/10 and 1+2/17 - For these last two my question is about the "1+other" haplotype. 1 is generally defined as the lack of a variant/allele. In these two scenarios, is Aldy saying that one of the suballeles within 1 was detected plus 94 on the same haplotype?

This means that you have 3 copies of CYP2D6: 1, 94 and *10. Each copy has a distinct star-allele.

5/5

This is a complete gene deletion on both chromosomes.

anh151 commented 1 year ago

Hi @inumanag Thanks for the responses.

For the 5/5 scenario my question was related to the .tsv output from Aldy. The stdout contains the *5/*5 call, but the .tsv file is empty. Is this intentional or is this a bug?

inumanag commented 1 year ago

Hi @anh151

That is expected... right now. I can address this in the next release (the file will be still empty, but I can add a comment--- would that work?)

anh151 commented 1 year ago

Hi @inumanag
Are there any other scenarios for CYP2D6 or other genes where the .tsv file would be blank, but a call is made within the stdout? If not, then the existing implementation is fine. I am using Aldy to call CYP2D6 for a large dataset and I am not monitoring the stdout.

Thanks, Andrew

inumanag commented 1 year ago

It should only happen if you have two deletion alleles (5/5 in CYP2D6 case; for other genes, the * number might differ).

anh151 commented 1 year ago

Sounds good. Thanks for the help.