BSSeeker / BSseeker2

A versatile aligning pipeline for bisulfite sequencing data
http://pellegrini.mcdb.ucla.edu/BS_Seeker2/
MIT License
60 stars 25 forks source link

FAILED, ExitCode 1 in step #2 Map the reads #30

Open christinalrichards opened 5 years ago

christinalrichards commented 5 years ago

Hi Dr. Guo,

I have made quite some progress through step 1 and step 2, but I get a FAILED, ExitCode 1 after 12h53m of the mapping of PE reads. I used the attached shell file and it looks like it runs through this much :

Output the unmapped reads in PE mode:

python bs_seeker2-align.py -1 10_P_1.fq -2 10_P_2.fq -g AEKE03.fasta --aligner=bowtie2 -o 10_P.bam -u unmapped

Map the unmapped reads in mate 1:

python bs_seeker2-align.py -i unmapped_1.fa -g AEKE03.fasta -o unmapped_1.bam

Convert the unmapped reads in mate 2 to their reverse complementaries:

python Antisense.py -i unmapped_2.fa -o unmapped_2_antisense.fa

Map the unmapped reads in mate 2:

python bs_seeker2-align.py -i unmapped_2_antisense.fa -g mm9_phage.fa -o unmapped_2.bam

But did it finish? It says END in the log file (also attached).

BSSeeker2_Tomato.sh.docx

10_P.bam.bs_seeker2.log

guoweilong commented 5 years ago

Hi,

I suggest you follow this suggestion: https://github.com/BSSeeker/BSseeker2#qa11 by splitting the large input file into pieces, and then map them independently. The performance will be largely improved.

Best, Weilong

christinalrichards commented 5 years ago

Thank you for your response! Do you think it did not finish or are you concerned about the low mapping efficiency? I should have clarified that this is mapping the wild relative of tomato Lycopersicon pimpinellifolium to the domesticated tomato genome Solanum lycopersicum divergence: Estimated Time: 0.6751 MYA (according to http://www.timetree.org/). Perhaps I should try running one fo the samples of S. lycopersicum.

christinalrichards commented 5 years ago

Also how can I use this code for PE reads "split -l 4000000 input.fq"

guoweilong commented 5 years ago

For low mapping efficiency, please check to allow more mismatch, or to allow gap alignment, such as using bowtie2-end-to-end alignment mode or bowtie2-local alignment mode.

Split the large file will speed up the total time cost, and let me know whether the command works as early as possible.

Best, Weilong

guoweilong commented 5 years ago

zcat R1.fastq.gz | split -l 4000000 - R1 zcat R2.fastq.gz | split -l 4000000 - R2

christinalrichards commented 5 years ago

Hi! When I split the reads for R1 I get 24 files: -rw-r----- 1 clr pi_clr 238489543 Sep 6 16:27 xaa -rw-r----- 1 clr pi_clr 238494383 Sep 6 16:27 xab -rw-r----- 1 clr pi_clr 238505358 Sep 6 16:27 xac -rw-r----- 1 clr pi_clr 238516820 Sep 6 16:27 xad -rw-r----- 1 clr pi_clr 238489803 Sep 6 16:27 xae -rw-r----- 1 clr pi_clr 238478038 Sep 6 16:27 xaf -rw-r----- 1 clr pi_clr 238492368 Sep 6 16:27 xag -rw-r----- 1 clr pi_clr 238489493 Sep 6 16:27 xah -rw-r----- 1 clr pi_clr 238484043 Sep 6 16:27 xai -rw-r----- 1 clr pi_clr 238487565 Sep 6 16:27 xaj -rw-r----- 1 clr pi_clr 238483930 Sep 6 16:27 xak -rw-r----- 1 clr pi_clr 238487671 Sep 6 16:27 xal -rw-r----- 1 clr pi_clr 238483939 Sep 6 16:27 xam -rw-r----- 1 clr pi_clr 238487869 Sep 6 16:27 xan -rw-r----- 1 clr pi_clr 238513293 Sep 6 16:27 xao -rw-r----- 1 clr pi_clr 238500051 Sep 6 16:27 xap -rw-r----- 1 clr pi_clr 238513998 Sep 6 16:27 xaq -rw-r----- 1 clr pi_clr 238504783 Sep 6 16:27 xar -rw-r----- 1 clr pi_clr 238494160 Sep 6 16:27 xas -rw-r----- 1 clr pi_clr 238486544 Sep 6 16:27 xat -rw-r----- 1 clr pi_clr 238454100 Sep 6 16:27 xau -rw-r----- 1 clr pi_clr 238507996 Sep 6 16:27 xav -rw-r----- 1 clr pi_clr 238476629 Sep 6 16:27 xaw -rw-r----- 1 clr pi_clr 36872879 Sep 6 16:27 xax

and then if I stay in the same directory and split R2 it overwrites those 24 files with theses 24 files: -rw-r----- 1 clr pi_clr 238489543 Sep 6 16:32 xaa -rw-r----- 1 clr pi_clr 238494383 Sep 6 16:32 xab -rw-r----- 1 clr pi_clr 238505358 Sep 6 16:32 xac -rw-r----- 1 clr pi_clr 238516820 Sep 6 16:32 xad -rw-r----- 1 clr pi_clr 238489803 Sep 6 16:32 xae -rw-r----- 1 clr pi_clr 238478038 Sep 6 16:32 xaf -rw-r----- 1 clr pi_clr 238492368 Sep 6 16:32 xag -rw-r----- 1 clr pi_clr 238489493 Sep 6 16:32 xah -rw-r----- 1 clr pi_clr 238484043 Sep 6 16:32 xai -rw-r----- 1 clr pi_clr 238487565 Sep 6 16:32 xaj -rw-r----- 1 clr pi_clr 238483930 Sep 6 16:32 xak -rw-r----- 1 clr pi_clr 238487671 Sep 6 16:32 xal -rw-r----- 1 clr pi_clr 238483939 Sep 6 16:33 xam -rw-r----- 1 clr pi_clr 238487869 Sep 6 16:33 xan -rw-r----- 1 clr pi_clr 238513293 Sep 6 16:33 xao -rw-r----- 1 clr pi_clr 238500051 Sep 6 16:33 xap -rw-r----- 1 clr pi_clr 238513998 Sep 6 16:33 xaq -rw-r----- 1 clr pi_clr 238504783 Sep 6 16:33 xar -rw-r----- 1 clr pi_clr 238494160 Sep 6 16:33 xas -rw-r----- 1 clr pi_clr 238486544 Sep 6 16:33 xat -rw-r----- 1 clr pi_clr 238454100 Sep 6 16:33 xau -rw-r----- 1 clr pi_clr 238507996 Sep 6 16:33 xav -rw-r----- 1 clr pi_clr 238476629 Sep 6 16:33 xaw -rw-r----- 1 clr pi_clr 36872879 Sep 6 16:33 xax

Should I somehow rename each of the pairs of files (from R1 and R2) eg: -rw-r----- 1 clr pi_clr 238489543 Sep 6 16:27 xaa -rw-r----- 1 clr pi_clr 238489543 Sep 6 16:32 xaa

to then use this series of commands:

Paired-end, QSEQ, bowtie, report concordant reads, and remap the unmapped reads in single-end mode

Output the unmapped reads in PE mode

bs_seeker2-align.py -1 FN1 -2 FN2 -g genome.fa -o PE.bam -u unmapped

Map the unmapped reads in mate 1

bs_seeker2-align.py -i unmapped_1.fa -g genome.fa -o unmapped_1.bam

Convert the unmapped reads in mate 2 to their reverse complementaries

Antisense.py -i unmapped_2.fa -o unmapped_2_antisense.fa

Map the unmapped reads in mate 2

bs_seeker2-align.py -i unmapped_2_antisense.fa -g mm9_phage.fa -o unmapped_2.bam

Merge all the mapped results

samtools merge merge.bam PE.bam unmapped_[12].bam

guoweilong commented 5 years ago

To the following command to avoid overwriting the files.

    zcat  Mate1.fq.gz | split -l 4000000 - R1_ &
    zcat Mate2.fq.gz | split -l 4000000 - R2_ &

Best, Weilong