MarWoes / wg-blimp

wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data
GNU Affero General Public License v3.0
27 stars 12 forks source link

Picard running out of disk space #8

Closed MarWoes closed 4 years ago

MarWoes commented 4 years ago

When /tmp has size constrains, Picard may fail. Picard's TMP_DIR should be configurable through config files.

MarWoes commented 4 years ago

Fixed by fc9795c6062b60f7558a0bdeb2c2678b0e26d561

MarWoes commented 4 years ago

Included in release v0.9.6

harish0201 commented 4 years ago

Hi,

From grepping on the repo, I found that Picard was being used to Mark Dups.

Would you be comfortable in adding samblaster instead of Picard as it involves something similar to this:

bwameth.py --reference ref.fa t_R1.fastq.gz t_R2.fastq.gz -t 12 | samblaster | samtools view -Shbo output.bam

Which might be fairly trivial as it marks duplicates on the fly with Picard's approach (from what I've read about it)?

Similarly, you can use sambamba to pipe samblaster's output to have a sorted bam?

MarWoes commented 4 years ago

Thanks for your comment! We were using Picard and samtools for stability reasons (back in the day sambamba had some issues with our data). I think it might be worth revisiting both tools to maybe accelerate everything, I'll have a look!