biod / sambamba

Tools for working with SAM/BAM data
http://thebird.nl/blog/D_Dragon.html
GNU General Public License v2.0
563 stars 105 forks source link

sambamba slice piped with sambamba view is having runtime problem when running in parallel #513

Open yangyxt opened 11 months ago

yangyxt commented 11 months ago

Only bug reports!

The D version of Sambamba is in maintenance mode. Use the github issue tracker to report bugs only. For comments, questions and features, please use the google group mailing list as stated on the README!

Describe the bug

The command is like this:

""" sambamba slice -q -L xxx.bed xxx.bam | sambamba view -q -F "xxx" -f sam /dev/stdin | sort - | uniq - """

When I run this command in interactive shell the command finishes in 20 seconds. But if I run it in python subprocess, in parallel (silcing the same bam file with different bed region files), it sometimes run into a situation where the command runs for hours (like 20 -40 hours according to my observation) and in htop interface, the job is always in an R status instead of stay in dormant (S or other status) and the CPU usage is around 100%. I'm so confused and having trouble to reproduce the issue if not run the entire pipeline.

To Reproduce

seems that I cannot reproduce the error to you guys instantly with any small samples. I just wonder whether my description will remind you of any similar issues you've been dealt with ? I cannot find any similar issues in the github issues section though. Thanks.

The sambamba version I'm using is 1.0.0. Thanks.