Closed paraveeown closed 8 months ago
Hi @paraveeown , Aldy performs read realignment through indelpost because tandem indels often cannot be observed directly from the raw alignment data. I'd honestly trust Aldy here unless you already did indel realignment yourself and are 100% confident that the indel is not there.
Hi,
I’m using Aldy v4.4 (Python 3.9.7 on Linux 3.10.0-1160.95.1.el7.x86_64-x86_64-with-glibc2.17) with my targeted sequencing data (hg19) from a panel called PKseq that genotyped 100 pharmacogenes.
This was the command I used to run each sample:
aldy genotype -p samplename.bam -l samplename.log -o samplename.aldy samplename.bam
I used BAM file as an input, and as the data didn’t fit aldy’s default profiles, I also used the BAM file as a profile.
After the run, I randomly checked the outputs and found some confusing results involving indels. For example, in CYP2C9, one sample got 1/85.
According to pharmvar, CYP2C9*85 is a deletion at chr10: 96708920 (GRCh37). When i checked the BAM in genome browser, there was a deletion at position 96708919, which I assumed was the same deletion with different notation (there was no deletion at 96708920). However, this deletion was found in only 4 reads among the total 4580 reads at this position. I’m not sure if this should be considered a real deletion? A zoom in:
When I checked aldy log file, it seems like the program called this deletion with coverage of 229. I’m confused as to where this coverage came from?
I tried running the same sample with different options, and this deletion always got the coverage of 299, although the neutral values changed between runs.
The only time the deletion wasn’t considered and aldy reported 1/1 was when I used a VCF file as an input instead of BAM because this deletion wasn’t called in the VCF.
This problem occured with other samples that got CYP2C985 too. For example, this sample only got the deletion in 2 out of 2180 reads, but aldy called it as 1/*85 and reported the coverage of the deletion to be 189.
I suspected this might be an issue with how read depths/coverages were recalculated in panel sequencing data, but I'm not sure. Has anyone ever encountered similar problems or did I misunderstanding something?
Feel free to ask of more information and thank you for your help. Paravee