0xTCG / aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
http://aldy.csail.mit.edu
Other
55 stars 20 forks source link

Question about low coverage sequencing samples #67

Closed anh151 closed 6 months ago

anh151 commented 10 months ago

Hello, We have a number of short read whole genome sequencing samples (~200) where Aldy is unable to assign a genotype due to low coverage. I was able to successfully run Cyrius and PyPGx on most of these samples and most have some kind of structural variant as part of a tandem. I.e 1/68+4 or 68/*68+4, etc. Coverage across the full dataset is roughly 35x, however it is possible that these samples have lower coverage than the rest. Is there a way to force Aldy to try to call these samples or would that just result in too much uncertainty in the call? Attached is debug file from one of the samples.

🐿  Aldy v4.5 (Python 3.9.15 on Linux 5.15.133+-x86_64-with-glibc2.31)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample 1020075.bam...
Potential CYP2D6 gene structures for 1020075:
   1: 2x*68 (confidence: 100%)
Potential major CYP2D6 star-alleles for 1020075:
   1: 2x*68 & rs1135840 (confidence: 100%)
ERROR: gene= CYP2D6, profile= wgs, file= filtered_bams/1020075.bam
Aldy could not phase any major solution.
Possible solutions:
 - Check the coverage. Extremely low coverage prevents Aldy from calling star-alleles.
 - Run with --debug parameter and notify the authors of Aldy.
Preparing debug archive...

debug.info.tar.gz

Thanks, Andrew

rrdavis77 commented 10 months ago

Hey @anh151, since you are on 4.5 you might want to try this new parameter: https://github.com/0xTCG/aldy/issues/66#issuecomment-1806982434

anh151 commented 10 months ago

@rrdavis77 I tried using that parameter but I still get the same error. DId it work for you?

aldy genotype --param min_avg_coverage=0 -p wgs -g CYP2D6 -o aldy_results/aldy_1020075.tsv filtered_bams/1020075.bam
🐿  Aldy v4.5 (Python 3.9.15 on Linux 5.15.133+-x86_64-with-glibc2.31)
   (c) 2016-2023 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
Genotyping sample 1020075.bam...
Potential CYP2D6 gene structures for 1020075:
   1: 2x*68 (confidence: 100%)
Potential major CYP2D6 star-alleles for 1020075:
   1: 2x*68 & rs1135840 (confidence: 100%)
ERROR: gene= CYP2D6, profile= wgs, file= filtered_bams/1020075.bam
Aldy could not phase any major solution.
Possible solutions:
 - Check the coverage. Extremely low coverage prevents Aldy from calling star-alleles.
 - Run with --debug parameter and notify the authors of Aldy.
inumanag commented 6 months ago

@anh151 This probably means that Aldy cannot refine the calls due to under-sequenced regions. I'd suggest using potential major star-alleles as the final output (2x*68): refinement anyways picks one of the major candidates.