bcgsc / abyss

:microscope: Assemble large genomes using short reads
http://www.bcgsc.ca/platform/bioinfo/software/abyss
Other
310 stars 108 forks source link

abyss-pe fails with empty pelib-3.sam.gz file #208

Closed lcoombe closed 6 years ago

lcoombe commented 6 years ago

Hello Ben/Shaun!

For my targeted assembly pipeline, I do a bunch of small ABySS assemblies (contig stage).

I saw this error for one of my targeted assemblies:

[lcoombe@gphost07 score0.25]$ abyss-pe k=80 l=40 s=1000 v=-v q=15 B=3G j=1 kc=3 H=4 S=1000-10000 N=9 name=NM_001027489.2_score0.25 pe-sam contigs
abyss-map -v  -j1 -l40    NM_001027489.2_score0.25_filtered-filteredReads.fq.gz NM_001027489.2_score0.25-3.fa \
        |abyss-fixmate -v  -l40  -h pelib-3.hist \
        |sort -snk3 -k4 \
        |gzip >pelib-3.sam.gz
Reading from standard input...
Reading `NM_001027489.2_score0.25-3.fa'...
Using 324 kB of memory and 1.62e+05 B/sequence.
Reading `NM_001027489.2_score0.25-3.fa'...
Building the suffix array...
Building the Burrows-Wheeler transform...
Building the character occurrence table...
Read 109 kB in 2 contigs.
Using 1.53 MB of memory and 14 B/bp.
Mapped 823 of 962 reads (85.6%)
Mapped 822 of 962 reads uniquely (85.4%)
Read 962 alignments
Mateless     0
Unaligned    0
Singleton  139  28.9%
FR         334  69.4%
RF           8  1.66%
FF           0
Different    0
Total      481
FR Stats mean: 345.1 median: 347 sd: 31.57 n: 323 min: 277 max: 419 ignored: 19
___▁ ▁▁__▁▁▁▁▁▁▁▁▂▄▂▁▁▃▁▅▂▂▃▅▆▄▄▁▄▂▅▆▄▄▄▂▆▁▄▆█▁▁▁_▁▃▃▁▂▁ ▁_ __▃_▁
RF Stats mean: 174 median: 174 sd: 0 n: 1 min: 174 max: 174 ignored: 341
 ▄
gunzip -c pelib-3.sam.gz \
    |DistanceEst -v  -j1 -k80 -l40 -s1000 -n10   -o pelib-3.dist pelib-3.hist
Mate orientation FR: 334 (97.7%) RF: 8 (2.34%)
The library pelib-3.hist is oriented forward-reverse (FR).
Stats mean: 345.1 median: 347 sd: 31.57 n: 323 min: 277 max: 419
___▁ ▁▁__▁▁▁▁▁▁▁▁▂▄▂▁▁▃▁▅▂▂▃▅▆▄▄▁▄▂▅▆▄▄▄▂▆▁▄▆█▁▁▁_▁▃▃▁▂▁ ▁_ __▃_▁
Minimum and maximum distance are set to -79 and 419 bp.
DistanceEst: DistanceEst.cpp:619: int main(int, char**): Assertion `in' failed.
/bin/bash: line 1: 89564 Done                    gunzip -c pelib-3.sam.gz
     89565 Aborted                 (core dumped) | DistanceEst -v -j1 -k80 -l40 -s1000 -n10 -o pelib-3.dist pelib-3.hist
make: *** [pelib-3.dist] Error 134
make: *** Deleting file `pelib-3.dist'

Looking at the log file, I suspect that when there are 0 alignments in the 'different' category, then no alignments will be written to the pelib-3.sam.gz file? And then ABySS fails with no alignments there? Is that the expected behaviour? For my application, I wouldn't want abyss to fail there -- this was a case where my target was already assembled completely prior to the contig stage. I'm using abyss 2.02 by the way (from linuxbrew)

Thank you!

benvvalk commented 6 years ago

@lcoombe:

Yes, I recognize this situation/error and other people have complained about it as well.

Ideally DistanceEst should handle empty input gracefully.

A hacky workaround would be run abyss-pe twice: first with the pesam target, then check if the SAM file is empty, then run it a second time with the contigs target. But that's fairly annoying, I know.

lcoombe commented 6 years ago

Sounds good - I'll do that! Thanks Ben!

mmokrejs commented 6 years ago

@lcoombe I think it would be great if you kept this issue open until it is fixed in code.

sjackman commented 6 years ago

We close issues that we're not actively working on.