Closed pkuerten closed 6 years ago
Long runtimes are often related to some kind of sequencing library problem. Could you please run Alfred on that BAM file and post the QC metrics here? Alternatively you can also send me the stats.tsv.gz file via email.
./alfred qc -r <ref.fa> -o <stats.tsv.gz> <align.bam>
Metrics:
zgrep ^ME stats.tsv.gz
Thank you for the prompt reply. I am running it right now. I will update when its done.
Hi Tobias, I emailed you the grepped output of alfred. Looking forward to your advice.
Hi,
Please look at #MappedPairs and #MappedSameChr. You have ~173 million inter-chromosomal read pairs (~25% of your data!!!) where one end maps to chrA and the other one to chrB. This is highly unlikely even for highly rearranged tumors so I suspect some library prep failure. In any case, this leads to millions of translocation calls and when Delly tries to find the breakpoint for these it is no surprise that this takes forever.
Thank you so much Tobias! I completely missed that.
Hello, I suppose I am facing a similar issue, but I believe that it's not a low quality library, but a fragmented reference that is causing the troubles. Here are some of the relevant columns from alfred
(which seems like a really handy tool that I was totally not aware of, thanks :-))
X.Pairs X.MappedPairs MappedPairsFraction X.MappedSameChr
1 30001342 29892105 0.996359 25188623
2 30091681 29984666 0.996444 25292826
3 29954854 29847394 0.996413 25160742
MappedSameChrFraction X.MappedProperPair MappedProperFraction X.ReferenceBp
1 0.839583 25094015 0.836430 1180736944
2 0.840526 25201792 0.837500 1180736944
3 0.839955 25065438 0.836774 1180736944
There are plenty of pairs mapped between scaffolds, that's totally expected given it's a draft genome with millions of scaffolds. Is there a way how to tell Delly not to call interchromosomal rearrangements?
The delly call I had is
delly call -g $GENOME $BAM -o $DELLY
[2019-Apr-24 12:30:20] delly call -g data/2_Tcm/reference/2_Tcm_b3v08.fasta.gz data/2_Tcm/mapping/Tcm_01_to_b3v08.bam -o data/2_Tcm/variant_calls/Tcm_01/Tcm_01_delly.bcf
[2019-Apr-24 12:30:30] Paired-end clustering
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
***************************************************
[2019-Apr-24 22:46:30] Split-read alignment
0% 10 20 30 40 50 60 70 80 90 100%
|----|----|----|----|----|----|----|----|----|----|
*
Thank you.
Ha, sorry, did not find the #119 before. I am trying to run it with -r 200
now. :-)
Sorry again, -r 200
does not seem to speed it up. It is for three days already at the first star.
Also, maybe worth mentioning that I am running version 0.7.8
, is that a problem?
Alright, when I updated Delly
, I the problem disappeared.
First, I would like to say that the installation procedure was very smooth and overall, the interface of Delly is really nice. Thank you for making the tool so accessible.
The part that I have not enjoyed much was that the releases do not have descriptions and the commit was not mentioned in any of the threads here and therefore it took me 4 days to figure this out.
We are currently supporting many open source and free software projects in addition to the cancer genomics research we are doing, which sometimes clearly goes at the expensive of a good code documentation. I am sorry for the trouble but I am glad you could solve the problem.
Don't worry, of all the tools I have encountered during the last four years of genomics yours is one of the easiest to handle. I did mean it as a criticism, rather as feedback. Thanks again for developing delly.
I am running delly2 current version on multiple tumor-normal pairs. For one of the pairs, it is taking a lot longer than the rest (5 days now). Looking at the output it is still stuck in the split read alignment. I have tried tweaking the parameters myself but have not been able to speed it up. I am currently trying
delly call -n -s 15 -q 20 -x human.hg19.excl.tsv ...
Any advice or guidance would be most helpful. Thank you.