FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

Bismark Methylation Extraction failing, "The strand information was neither + nor -" #154

Closed gocougpullman closed 6 years ago

gocougpullman commented 6 years ago

Hi,

I'm having issues. When I run bismark methylation extraction it keeps failing on me. The issue seems to be around this

The strand information was neither + nor -: 

Much of the information about the code I am running is posted below. Thank you for your help

#!/bin/bash
#SBATCH --partition=backfill
#SBATCH --job-name=Bismarktest
#SBATCH --output=Bismarktest.out
#SBATCH --error=Bismarktest.er
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=24

module load samtools
module load bowtie2

/path/simon/programs/Bismark_v0.19.0/bismark --bowtie2 --path_to_bowtie /path/simon/programs/bowtie2-2.3.4-linux-x86_64 --multicore 4 -q --gzip --output_dir /path/simon/bisulfite.sequencing/25.out /path/simon/ncbi.ref/Homo_sapiens/NCBI/build37.1/Sequence/Bowtie2Index --single_end /path/simon/bisulfite.sequencing/SRR5914725.fastq.gz

This next block of code wasn't run on the cluster so the paths are different. I experienced the same thing when running this on the cluster, I just have been troubleshooting on my ubuntu machine because it's less time consuming to write scripts.

/media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor --gzip -s /home/simon/external.drive/Simon/SRR5914723_bismark_bt2.bam.gz /home/simon/external.drive/Simon/SRR5914725_bismark_bt2.bam.gz

 *** Bismark methylation extractor version v0.19.0 ***

Setting core usage to single-threaded (default). Consider using --multicore <int> to speed up the extraction process.

Summarising Bismark methylation extractor parameters:
===============================================================
Bismark single-end SAM format specified (default)
Number of cores to be used: 1
Output will be written to the current directory ('/home/simon')

Writing result file containing methylation information for C in CpG context from the original top strand to CpG_OT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CpG context from the complementary to original top strand to CpG_CTOT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CpG context from the complementary to original bottom strand to CpG_CTOB_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CpG context from the original bottom strand to CpG_OB_SRR5914723_bismark_bt2.bam.gz.txt.gz

Writing result file containing methylation information for C in CHG context from the original top strand to CHG_OT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHG context from the complementary to original top strand to CHG_CTOT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHG context from the complementary to original bottom strand to CHG_CTOB_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHG context from the original bottom strand to CHG_OB_SRR5914723_bismark_bt2.bam.gz.txt.gz

Writing result file containing methylation information for C in CHH context from the original top strand to CHH_OT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHH context from the complementary to original top strand to CHH_CTOT_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHH context from the complementary to original bottom strand to CHH_CTOB_SRR5914723_bismark_bt2.bam.gz.txt.gz
Writing result file containing methylation information for C in CHH context from the original bottom strand to CHH_OB_SRR5914723_bismark_bt2.bam.gz.txt.gz

Now reading in Bismark result file /home/simon/external.drive/Simon/SRR5914723_bismark_bt2.bam.gz
Use of uninitialized value $meth_call in split at /media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor line 4425, <IN> line 1.
Use of uninitialized value $strand in string eq at /media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor line 4497, <IN> line 1.
Use of uninitialized value $strand in string eq at /media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor line 4945, <IN> line 1.
Use of uninitialized value $strand in string eq at /media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor line 4993, <IN> line 1.
Use of uninitialized value $strand in concatenation (.) or string at /media/simon/myFiles/programs/Bismark_v0.19.0/bismark_methylation_extractor line 5043, <IN> line 1.
The strand information was neither + nor -: 

here is a sample of the *_bismark_bt2.bam.gz file

SRR5914723.1_1_length=101   16  15  62350908    40  101M    *   0   0   ATTATTAAACAAACCACAATAAACCAATCAAATAATCAACCCCCTATTAAAAACAAACTCTTACCAAACACTATAACTCACACCTATAATCCTAACACTTT   0<<B<0<BBB7<7BBBBBBBB<<BB<B7BBBB<BB0B<7F<'FF<F<<BBBBFFF<0B0B000FFFFBB<BBF0B0B<FBB<FFBFBFFFFFBB<FFFB<B   NM:i:17 MD:Z:3G2G0G0G9G1G12G4G2A13G0G5G3G0G7G9G8G6  XM:Z:...h..hhh.........x.h............h....x................xh.....h...xh.......h.........x........h......  XR:Z:CT XG:Z:GA
SRR5914723.2_2_length=101   0   12  99078994    42  101M    *   0   0   GTATTTTTTATATTTTTTAAATTTATTTAAAGAGAGGAGATATAATATAAATTTTTGTGGTATTTATTATTTAGTTTTAATAATTATTAGTATTTTGTTAA   BBBFFFFFFFFFFIIIIIIIFIIIFFIIFFIFBBBFFFFFFFIIIIIIIIIFIIIIIFFIBFFIIIIIFFFFFFFBBB<BBBFFFFFFFFFFFFFFFBFB<   NM:i:20 MD:Z:1C10C3C0C3C0C2C0C0C23C0C0C0C8C0C2C9C2C6C10C2   XM:Z:.h..........h...hh...hh..hhh.......................hhhx........hh..h.........h..h......x..........h..  XR:Z:CT XG:Z:CT
SRR5914723.4_4_length=101   16  10  89682060    42  101M    *   0   0   TTACCCAATCTCAAATAATTCTTTATAACAATATAAAAATAAACTAATACAAAAAAAAATATATATAATTACCAAATAACGAATTAACCATAAAAAAAAAT   BBBB<BBBB<0<FBB<BB<<<B<0<<BBBBB<B<FBFFBBB<B<7FB<7<FFFFIIFFF<B<F<FBFFBB7FIFFIFFBFBFFFFFFFFFFFFFFFFFBB<   NM:i:23 MD:Z:7G5G0G2G9G2G1G1G1G3G0G10G0G0G0G10G0G10G7G4G0G0G5G1 XM:Z:.......x.....xh..h.........h..x.h.h.h...hh..........hhhh..........hh..........h.Z.....h....hhh.....h.  XR:Z:CT XG:Z:GA
SRR5914723.6_6_length=101   16  9   15071755    42  101M    *   0   0   AAAACTCTAAATCTAAAAAATAACTCAATCTAAAAATTAAAAAAACAATCACAATTCTCTACATATTAACTCAATTCAAACAAAAACTTAAAATCAATCAA   FBB<BBBBFFFBB<FFFFFFFBBFFFFFFFFFIIIFFFIIIIIIFIIIFFBFIFFFFFFBBFFFFFFFFFBIIFFFIIIFIIIIFFFFFFFFFFFFFBBBB   NM:i:21 MD:Z:1G0G5G0G0G4G0G2G1G5G10G0G2G4G19G10G0G5G3G2G7G0

Here is a sample of my fastq file

@SRR5914723.1 1 length=101
AAAGTGTTAGGATTATAGGTGTGAGTTATAGTGTTTGGTAAGAGTTTGTTTTTAATAGGGGGTTGATTATTTGATTGGTTTATTGTGGTTTGTTTAATAAT
+SRR5914723.1 1 length=101
B<BFFF<BBFFFFFBFBFF<BBF<B0B0FBB<BBFFFF000B0B0<FFFBBBB<<F<FF'<F7<B0BB<BBBB7B<BB<<BBBBBBBB7<7BBB<0<B<<0
@SRR5914723.2 2 length=101
GTATTTTTTATATTTTTTAAATTTATTTAAAGAGAGGAGATATAATATAAATTTTTGTGGTATTTATTATTTAGTTTTAATAATTATTAGTATTTTGTTAA
+SRR5914723.2 2 length=101
BBBFFFFFFFFFFIIIIIIIFIIIFFIIFFIFBBBFFFFFFFIIIIIIIIIFIIIIIFFIBFFIIIIIFFFFFFFBBB<BBBFFFFFFFFFFFFFFFBFB<
gocougpullman commented 6 years ago

Hello everyone,

I got to the bottom of what was going on. My bam files were gzipped and apparently bismark_methylation_extractor isn't cool with that.

Thanks

FelixKrueger commented 6 years ago

I am glad this sorted itself out. Just out of interest, did Bismark produce the bam.gz file or did you gzip it yourself? If it came from Bismark then we might want to stop this from happening in the future. All the best, Felix

gocougpullman commented 6 years ago

Thanks for the reply Felix,

Yes I gzipped the file myself. But I did so because the Bismark User guide says

A space-separated list of result files in Bismark format from which methylation information is extracted for every cytosine in the read. The files may be gzip compressed (ending in .gz).

On page 28.

FelixKrueger commented 6 years ago

Ah right, this must the be very old vanilla Bismark format prior to version 0.6. BAM files are already compressed and won't gain much (or anything?) from gzipping again. Sorry for the confusion.

gocougpullman commented 6 years ago

That makes sense, Thanks for your help!

Simon