0xTCG / aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
http://aldy.csail.mit.edu
Other
57 stars 20 forks source link

Issue about indels’ coverages in targeted sequencing data #71

Closed paraveeown closed 8 months ago

paraveeown commented 10 months ago

Hi,

I’m using Aldy v4.4 (Python 3.9.7 on Linux 3.10.0-1160.95.1.el7.x86_64-x86_64-with-glibc2.17) with my targeted sequencing data (hg19) from a panel called PKseq that genotyped 100 pharmacogenes.

This was the command I used to run each sample:

aldy genotype -p samplename.bam -l samplename.log -o samplename.aldy samplename.bam

I used BAM file as an input, and as the data didn’t fit aldy’s default profiles, I also used the BAM file as a profile.

After the run, I randomly checked the outputs and found some confusing results involving indels. For example, in CYP2C9, one sample got 1/85. image

According to pharmvar, CYP2C9*85 is a deletion at chr10: 96708920 (GRCh37). When i checked the BAM in genome browser, there was a deletion at position 96708919, which I assumed was the same deletion with different notation (there was no deletion at 96708920). However, this deletion was found in only 4 reads among the total 4580 reads at this position. I’m not sure if this should be considered a real deletion? image A zoom in: image

When I checked aldy log file, it seems like the program called this deletion with coverage of 229. I’m confused as to where this coverage came from? image image

I tried running the same sample with different options, and this deletion always got the coverage of 299, although the neutral values changed between runs.

The only time the deletion wasn’t considered and aldy reported 1/1 was when I used a VCF file as an input instead of BAM because this deletion wasn’t called in the VCF.

This problem occured with other samples that got CYP2C985 too. For example, this sample only got the deletion in 2 out of 2180 reads, but aldy called it as 1/*85 and reported the coverage of the deletion to be 189. image image

I suspected this might be an issue with how read depths/coverages were recalculated in panel sequencing data, but I'm not sure. Has anyone ever encountered similar problems or did I misunderstanding something?

Feel free to ask of more information and thank you for your help. Paravee

inumanag commented 8 months ago

Hi @paraveeown , Aldy performs read realignment through indelpost because tandem indels often cannot be observed directly from the raw alignment data. I'd honestly trust Aldy here unless you already did indel realignment yourself and are 100% confident that the indel is not there.