GATB / gatb-minia-pipeline

GATB Minia assembly pipeline
29 stars 8 forks source link

ZeroDivisionError while running BESST #30

Open cimendes opened 3 years ago

cimendes commented 3 years ago

Greetings,

I've been using gatb-minima-pipeline for a while now for a de novo assembly software benchmark, available at Github. I've been running it so far without issue with real data, in this case, the ZymoBIOMICS Microbial Community Standard with even and log distribution.

Recently I've generated a mock community with the ZymoBIOMICS complete genomes and GATB-MINIA-PiPELINE is consistently failing on the samples without error model (perfect reads matching the reference). The mock read data is available here

I use the following command to run this assembler:

gatb -1 ${fastq_pair[0]} -2 ${fastq_pair[1]} --kmer-sizes ${kmer_list} -o ${sample_id}_GATBMiniaPipeline --no-error-correction (available here

the parameters used are

    gatbkmer = '21,61,101,141,181'
    gatb_besst_iter = 10000
    GATB_error_correction = false

This is the end of the stdout that I get while running gatb-minia-pipeline:

pe1_path: subENN_1.fq.gz
pe2_path: subENN_2.fq.gz
genome_path: subENN_GATBMiniaPipeline_k181.contigs.fa
output_path: subENN_GATBMiniaPipeline.lib_0
tmp_path: /mnt/beegfs/scratch/ONEIDA/cimendes/LMAS/work/18/993d693d0c79e5aef093045d4571d9/BESST_tmp
bwa path: bwa
number of threads: 8
Remove temp SAM and BAM files: No
Use bwa aln and sampe instead of bwa mem: No
Start processing.
Aligning with bwa mem.
Temp directory: /mnt/beegfs/scratch/ONEIDA/cimendes/LMAS/work/18/993d693d0c79e5aef093045d4571d9/BESST_tmp
Output path:    subENN_GATBMiniaPipeline.lib_0
Stderr file:    subENN_GATBMiniaPipeline.lib_0.bwa.1
Make bwa index... Done.
Align with bwa mem... Done.
Time elapsed for bwa index and mem:  0:02:10.200315
Convert SAM to BAM... Done.
Time elapsed for SAM to BAM conversion: 0:01:16.767939
Sort BAM... Done.
Time elapsed for BAM sorting: 0:01:08.141072
Index BAM... Done.
Time elapsed for BAM indexing: 0:00:03.765598
Remove temp files... Done.
Time elapsed for temp files removing: 0:00:00.008573
Processing is finished.
(2021-05-10 20:20:32) Execution of 'python BESST/runBESST'. Command line:
     /NGStools/gatb-minia-pipeline/tools/memused python /NGStools/gatb-minia-pipeline/BESST/runBESST -c subENN_GATBMiniaPipeline_k181.contigs.fa -f subENN_GATBMiniaPipeline.lib_0.bam -o subENN_GATBMiniaPipeline_besst --orientation fr --iter 10000
Number of initial contigs: 2028
Traceback (most recent call last):
  File "/NGStools/gatb-minia-pipeline/BESST/runBESST", line 401, in <module>
    main(args)
  File "/NGStools/gatb-minia-pipeline/BESST/runBESST", line 160, in main
    libmetrics.get_metrics(bam_file, param, Information)
  File "/NGStools/gatb-minia-pipeline/BESST/BESST/libmetrics.py", line 317, in get_metrics
    mean_isize = sum(filtered_list) / n
ZeroDivisionError: float division by zero
maximal memory used: 182 MB
(2021-05-10 20:20:40) Execution of 'python BESST/runBESST' failed. Command line:
     /NGStools/gatb-minia-pipeline/tools/memused python /NGStools/gatb-minia-pipeline/BESST/runBESST -c subENN_GATBMiniaPipeline_k181.contigs.fa -f subENN_GATBMiniaPipeline.lib_0.bam -o subENN_GATBMiniaPipeline_besst --orientation fr --iter 10000

This pipeline is being run in a dockerfile with the latest version in the master branch: cimendes/gatb-minia-pipeline:31.07.2020-1

Thank you for your assistance in understanding this error! Is there anything you suggest I do?

Best,

Inês

rchikhi commented 3 years ago

Hi Inês, apologies for the slow response, I can suggest the following quick fix: add the --no-scaffolding flag to the gatb executable. This will skip BESST, and for metagenome paired-end data, this may not be a too bad thing.