0xTCG / aldy

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes
http://aldy.csail.mit.edu
Other
52 stars 20 forks source link

Running Aldy on ONT data for CYP2D6 genotyping never finishes #73

Closed Tintest closed 1 month ago

Tintest commented 4 months ago

Hello,

I'm writing to you because I'm having a few problems with Aldy. First, I ran it through all the possible genes. I've had difficulties with the CYP2A6 and CYP2D6 genes, but I'm going to focus on the latter.

I tried to run Aldy v4.5 in a Singularity container on the CYP2D6 gene on 20 samples sequenced using Oxford Nanopore R10.4.1 technology. Some of these samples took up to 10h to get a result with Aldy on this single gene. I had to kill 4 processes after 3 days for 4 samples that still had no results. I had no difficulties with the remaining thirty or so genes.

singularity exec --bind /srv:/srv aldy.4.5.sif aldy
🐿  Aldy v4.5 (Python 3.11.7 on Linux 6.1.0-0.deb11.13-amd64-x86_64-with-glibc2.31)
   (c) 2016-2024 Aldy Authors. All rights reserved.
   Free for non-commercial/academic use only.
usage: aldy [-h] {genotype,test,license,query,q,profile,help} ...

Allelic decomposition and exact genotyping of highly polymorphic and structurally variant genes

positional arguments:
  {genotype,test,license,query,q,profile,help}
    genotype            Call the most likely genotype and diplotype within a sample.
    test                Run Aldy test suite. Recommended prior to the first use
    license             Show Aldy license
    query (q)           Query database definitions for a given gene.
    profile             Generate a sequencing profile for a SAM/BAM/CRAM file. Please check the documentation for more details.
    help                Show program usage and exit.

Here is the command line I used to run my Aldy processes on all my samples.

for i in CYP2D6 ; do for y in $(ls PGx-cyp2d6-batch1-analysis/mod_mappings_phased_hg38/*.bam) ; do name=`basename $y _phased_hg38.bam` ; singularity exec --bind /srv:/srv aldy.4.5.sif aldy genotype --verbosity info --genome hg38 --reference GCA_000001405.15_GRCh38_no_alt_analysis_set.fasta --profile pgx_ont.yml --gene $i --log aldy_batch1/${name}_${i}.log --output aldy_batch1/${name}_${i}.log $y ; done ; done

I tried various combinations, but always had samples that did not finish running in a reasonable amount of time.

Attached is a log file for one of the samples for which I had to kill the process.

sample74_phased_hg38.log

I used a profile file with the option sam_long_reads enabled.

I can provide you with a log and debug file of one of the samples that finished running in several hours.

Regards.

inumanag commented 3 months ago

@Tintest Aldy does not yet support ONT (it's planned, though). While it can sometimes work, sometimes it spends too much time phasing the reads (I suspect this is due to the error rate). Try running Aldy with --param phase=False and see if it works.

inumanag commented 1 month ago

Closing for now; let me know if this still keeps causing problems.