0xTCG / aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
http://aldy.csail.mit.edu
Other
56 stars 20 forks source link

ALDY's detection of insertion and deletion variants #61

Closed yscindyliu closed 10 months ago

yscindyliu commented 1 year ago

Hello,

I am currently using ALDY for genotyping my samples, and I have come across an issue regarding the detection of insertion and deletion variants in the samples. It seems that ALDY is unable to identify variants with insertions or deletions, which leads to incorrect star allele assignments.

Examples of the problem are as follows: (1) In the case of CYP2C96, the variant rs9332131 is a deletion of 'A'. However, ALDY fails to detect this deletion and wrongly assigns 6 as *1. (rs9332131 recorded in cyp2c9.yml as [16126, delA, rs9332131, K273fs])

(2) For CYP3A57.001, one of its variants (rs41303343) contains an insertion 'T' that ALDY cannot detect, resulting in misidentification of 7 as 1.002.

(3) UGT1A1 28 , UGT1A1 36 , and UGT1A1 *37 all have rs3064744 with insertions 'TA' or deletions 'TA,' and ALDY is unable to detect these, failing to identify the correct alleles.

(4) Additionally, in the case of CYP2C19 39, one of the ten variants is 4193delT, causing ALDY to produce a result of (CYP2C19 39.001 - rs17880036).

Is there any specific configuration or setting that needs to be adjusted in ALDY to enable the correct detection of insertion and deletion variants? Your guidance and support would be highly appreciated.

aldy genotype -p illumina --gene cyp2c9 --genome hg38 my.vcf -o my.cyp2c9.aldy

Thank you for your attention and support.

inumanag commented 10 months ago

Hi @yscindyliu

Aldy should be able to pick those up. As it seems that you are using VCF files, it might be the problem with our VCF parser.

Can you please show me the relevant VCF lines that have those variants so that I can see if it is a problem in our VCF parser?

yscindyliu commented 10 months ago

Thanks for getting back to me.

These are the positions in VCF: (1) CYP2C9*6.001:

(2) CYP3A5*7.001:

(3)UGT1A1 28 , UGT1A1 36 , and UGT1A1 *37 all have rs3064744 :

(4)CYP2C19 *39.001-rs17880036 :

(5)NUDT15 2.001 and NUDT15 6.001:

inumanag commented 10 months ago

Hi @yscindyliu

Those ambiguous tandem indels have many different interpretations (e.g., NUDT15 indel is at chr13:48037801 in our database). While Aldy accounts for these issues when using SAM/BAM files, we do not currently handle them with VCFs.

Another thing that might help is to correct VCFs with IndelRealigner or similar tools that move all such indels to their "canonical" positions.

Also, make sure to set up genome (hg38) for VCFs manually via --genome.