BIMSBbioinfo / pigx_bsseq

bisulfite sequencing pipeline from fastq to methylation reports
https://bioinformatics.mdc-berlin.de/pigx/
GNU General Public License v3.0
9 stars 4 forks source link

"make check" fails (out of memory?) #164

Open rekado opened 4 years ago

rekado commented 4 years ago

I tried upgrading the Guix package to 0.1.2 but ran into this error during the check phase:

[Sun Aug 16 12:35:11 2020]
Error in rule deduplication_se:
    jobid: 51
    output: /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_se_bt2.sorted.deduped.bam
    log: /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log (check log file(s) for error message)
    shell:
        echo ----------  $(date +"[%Y-%m-%d %T]") Starting Now  ---------- > /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log; nice -19 /gnu/store/24n38fdh074xbrygcwqwcxw4cmlzwnr4-samtools-1.9/bin/samtools  view -h /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/04_mapping/SEsample_v2copy_trimmed_bismark_bt2.bam  |  /gnu/store/4vxq776kskp13xf21vjy9ib7ry8czk08-samblaster-0.1.24/bin/samblaster -r  2> /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log | /gnu/store/24n38fdh074xbrygcwqwcxw4cmlzwnr4-samtools-1.9/bin/samtools sort -T=/tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy/ -o /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_se_bt2.sorted.deduped.bam -@ 12 -m 12G -l 9 2> /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log ; /gnu/store/24n38fdh074xbrygcwqwcxw4cmlzwnr4-samtools-1.9/bin/samtools index /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_se_bt2.sorted.deduped.bam >> /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log 2>&1 ; echo ----------  $(date +"[%Y-%m-%d %T]") Done  ---------- >> /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2copy_deduplication.log;
        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

...

Exiting because a job execution failed. Look above for error message
Complete log: /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/.snakemake/log/2020-08-16T123444.205988.snakemake.log
ERROR: could not find bismark report for PEsample_wgbs_1_val_1_bt2

Looking at the log file I see that I don't have enough memory:

cat /tmp/guix-build-pigx-bsseq-0.1.2.drv-0/pigx_bsseq-0.1.2/tests/out/05_sorting_deduplication/SEsample_v2_deduplication.log
samtools sort: couldn't allocate memory for bam_mem
m stdin
samblaster: Outputting to stdout
samblaster: Loaded 1 header sequence entries.

Is this expected? Oh, wait: it is! The command looks like this: ... -m 12G.... Why that much memory? Can we get by with less?

Alternatively, can we disable this step for the test suite?

alexg9010 commented 4 years ago

Hi @rekado,

I guess we can lower these requirements, they are defined in the settings file (https://github.com/BIMSBbioinfo/pigx_bsseq/blob/master/etc/settings.yaml.in#L101).

I just realized now that these memory and thread settings refer to samtools actually, however these settings should probably be as rule specific as possible and thus I need to add another section for the deduplication_se/_pe rules.

Probably we should also try some profiling to determine minimum memory requirement to achieve a acceptable pipeline runtime.