biomedicalinformaticsgroup / Sargasso

Sargasso disambiguates mixed-species high-throughput sequencing data.
http://biomedicalinformaticsgroup.github.io/Sargasso/
Other
8 stars 4 forks source link

biamark multi-threads error #97

Open hxin opened 5 years ago

hxin commented 5 years ago

we are getting the following error when running bismark with -p 2(or any other value greater than 2)

Segmentation fault (core dumped)
(ERR): bowtie2-align exited with value 139

This error is raise by bowtie2, which does not stpo the bismark program from running. The error is not recorded in the bismark final output, only appear in console log.

The error occurs at an unknown stage of bowtie2, which results in a corrupted bam file. The bismark will try to process this corrupted bam file regardless, which may or may not go into problem, depending on where the bam file is corrupted.

The problem can be recreated by using the sargasso test sample file:

bismark --non_bs_mm --ambig_bam --bowtie2 --output_dir mapped_reads --basename bisulfite_human_pe_sample.human mapper_indexes/human -1 raw_reads/bisulfite_human_pe_sample/reads_1/bisulfite_human_pe_R1.fastq.gz -2 raw_reads/bisulfite_human_pe_sample/reads_2/bisulfite_human_pe_R2.fastq.gz
bismark --non_bs_mm --ambig_bam --bowtie2 -p 2 --output_dir mapped_reads --basename bisulfite_human_pe_sample.human mapper_indexes/human -1 raw_reads/bisulfite_human_pe_sample/reads_1/bisulfite_human_pe_R1.fastq.gz -2 raw_reads/bisulfite_human_pe_sample/reads_2/bisulfite_human_pe_R2.fastq.gz

perl /usr/local/bin/bowtie2 -q --score-min L,0,-0.2 -p 2 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 --norc -x /srv/data/genome/human/ensembl-95/Bisulfite_Genome/CT_conversion/BS_CT -1 bisulfite_human_pe_R1.fastq.gz_C_to_T.fastq -2 bisulfite_human_pe_R2.fastq.gz_G_to_A.fastq
perl /usr/local/bin/bowtie2 -q --score-min L,0,-0.2 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 --norc -x /srv/data/genome/human/ensembl-95/Bisulfite_Genome/CT_conversion/BS_CT -1 bisulfite_human_pe_R1.fastq.gz_C_to_T.fastq -2 bisulfite_human_pe_R2.fastq.gz_G_to_A.fastq
perl /usr/local/bin/bowtie2 -q --score-min L,0,-0.2 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 --nofw -x /srv/data/genome/human/ensembl-95/Bisulfite_Genome/GA_conversion/BS_GA -1 bisulfite_human_pe_R1.fastq.gz_C_to_T.fastq -2 bisulfite_human_pe_R2.fastq.gz_G_to_A.fastq
perl /usr/local/bin/bowtie2 -q --score-min L,0,-0.2 -p 2 --reorder --ignore-quals --no-mixed --no-discordant --dovetail --maxins 500 --nofw -x /srv/data/genome/human/ensembl-95/Bisulfite_Genome/GA_conversion/BS_GA -1 bisulfite_human_pe_R1.fastq.gz_C_to_T.fastq -2 bisulfite_human_pe_R2.fastq.gz_G_to_A.fastq

The temp solution is to NOT specify the -p parameter of bismark, which will restrict the bowtie2 run to be single thread. This is currently hard coded in the _map_readsbisulfite script.

biamark help page RE -p

-p NTHREADS              Launch NTHREADS parallel search threads (default: 1). Threads will run on separate processors/cores
                         and synchronize when parsing reads and outputting alignments. Searching for alignments is highly
                         parallel, and speedup is close to linear. Increasing -p increases Bowtie 2's memory footprint.
                         E.g. when aligning to a human genome index, increasing -p from 1 to 8 increases the memory footprint
                         by a few hundred megabytes. This option is only available if bowtie is linked with the pthreads
                         library (i.e. if BOWTIE_PTHREADS=0 is not specified at build time). In addition, this option will
                         automatically use the option '--reorder', which guarantees that output SAM records are printed in
                         an order corresponding to the order of the reads in the original input file, even when -p is set
                         greater than 1 (Bismark requires the Bowtie 2 output to be this way). Specifying --reorder and
                         setting -p greater than 1 causes Bowtie 2 to run somewhat slower and use somewhat more memory then
                         if --reorder were not specified. Has no effect if -p is set to 1, since output order will naturally
                         correspond to input order in that case.