split read alignment taking a long time

pkuerten commented 6 years ago

I am running delly2 current version on multiple tumor-normal pairs. For one of the pairs, it is taking a lot longer than the rest (5 days now). Looking at the output it is still stuck in the split read alignment. I have tried tweaking the parameters myself but have not been able to speed it up. I am currently trying

delly call -n -s 15 -q 20 -x human.hg19.excl.tsv ...

Any advice or guidance would be most helpful. Thank you.

tobiasrausch commented 6 years ago

Long runtimes are often related to some kind of sequencing library problem. Could you please run Alfred on that BAM file and post the QC metrics here? Alternatively you can also send me the stats.tsv.gz file via email.

./alfred qc -r <ref.fa> -o <stats.tsv.gz> <align.bam>

Metrics:

zgrep ^ME stats.tsv.gz

pkuerten commented 6 years ago

Thank you for the prompt reply. I am running it right now. I will update when its done.

pkuerten commented 6 years ago

Hi Tobias, I emailed you the grepped output of alfred. Looking forward to your advice.

tobiasrausch commented 6 years ago

Hi,

Please look at #MappedPairs and #MappedSameChr. You have ~173 million inter-chromosomal read pairs (~25% of your data!!!) where one end maps to chrA and the other one to chrB. This is highly unlikely even for highly rearranged tumors so I suspect some library prep failure. In any case, this leads to millions of translocation calls and when Delly tries to find the breakpoint for these it is no surprise that this takes forever.

pkuerten commented 6 years ago

Thank you so much Tobias! I completely missed that.

KamilSJaron commented 5 years ago

Hello, I suppose I am facing a similar issue, but I believe that it's not a low quality library, but a fragmented reference that is causing the troubles. Here are some of the relevant columns from alfred (which seems like a really handy tool that I was totally not aware of, thanks :-))

   X.Pairs X.MappedPairs MappedPairsFraction X.MappedSameChr
1 30001342      29892105            0.996359        25188623
2 30091681      29984666            0.996444        25292826
3 29954854      29847394            0.996413        25160742
  MappedSameChrFraction X.MappedProperPair MappedProperFraction X.ReferenceBp
1              0.839583           25094015             0.836430    1180736944
2              0.840526           25201792             0.837500    1180736944
3              0.839955           25065438             0.836774    1180736944

There are plenty of pairs mapped between scaffolds, that's totally expected given it's a draft genome with millions of scaffolds. Is there a way how to tell Delly not to call interchromosomal rearrangements?

The delly call I had is

delly call -g $GENOME $BAM -o $DELLY
[2019-Apr-24 12:30:20] delly call -g data/2_Tcm/reference/2_Tcm_b3v08.fasta.gz data/2_Tcm/mapping/Tcm_01_to_b3v08.bam -o data/2_Tcm/variant_calls/Tcm_01/Tcm_01_delly.bcf
[2019-Apr-24 12:30:30] Paired-end clustering

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
[2019-Apr-24 22:46:30] Split-read alignment

0%   10   20   30   40   50   60   70   80   90   100%
|----|----|----|----|----|----|----|----|----|----|
*

Thank you.

KamilSJaron commented 5 years ago

Ha, sorry, did not find the #119 before. I am trying to run it with -r 200 now. :-)

KamilSJaron commented 5 years ago

Sorry again, -r 200 does not seem to speed it up. It is for three days already at the first star.

Also, maybe worth mentioning that I am running version 0.7.8, is that a problem?

KamilSJaron commented 5 years ago

Alright, when I updated Delly, I the problem disappeared.

First, I would like to say that the installation procedure was very smooth and overall, the interface of Delly is really nice. Thank you for making the tool so accessible.

The part that I have not enjoyed much was that the releases do not have descriptions and the commit was not mentioned in any of the threads here and therefore it took me 4 days to figure this out.

tobiasrausch commented 5 years ago

We are currently supporting many open source and free software projects in addition to the cancer genomics research we are doing, which sometimes clearly goes at the expensive of a good code documentation. I am sorry for the trouble but I am glad you could solve the problem.

KamilSJaron commented 5 years ago

Don't worry, of all the tools I have encountered during the last four years of genomics yours is one of the easiest to handle. I did mean it as a criticism, rather as feedback. Thanks again for developing delly.

dellytools / delly

split read alignment taking a long time #115