FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

Use of uninitialized value $read_conversion in string eq #468

Closed liuyang2006 closed 2 years ago

liuyang2006 commented 2 years ago

Dear Felix, When I extract methylation call from BAM files, it always failed with perl error:

Command:

bismark_methylation_extractor -s -o bismark_methylation_extractor_mm9 --gzip --bedGraph --genome_folder /fastscratch/chenx/share_yang 542VTF_AzAc_12.sorted.bam

Error:

Now reading in Bismark result file 542VTF_AzAc_12.sorted.bam
skipping SAM header line:   @HD VN:1.4  SO:coordinate
skipping SAM header line:   @SQ SN:chr1 LN:197195432
skipping SAM header line:   @SQ SN:chr2 LN:181748087
skipping SAM header line:   @SQ SN:chrX LN:166650296
skipping SAM header line:   @SQ SN:chr3 LN:159599783
skipping SAM header line:   @SQ SN:chr4 LN:155630120
skipping SAM header line:   @SQ SN:chr5 LN:152537259
skipping SAM header line:   @SQ SN:chr7 LN:152524553
skipping SAM header line:   @SQ SN:chr6 LN:149517037
skipping SAM header line:   @SQ SN:chr8 LN:131738871
skipping SAM header line:   @SQ SN:chr10    LN:129993255
skipping SAM header line:   @SQ SN:chr14    LN:125194864
skipping SAM header line:   @SQ SN:chr9 LN:124076172
skipping SAM header line:   @SQ SN:chr11    LN:121843856
skipping SAM header line:   @SQ SN:chr12    LN:121257530
skipping SAM header line:   @SQ SN:chr13    LN:120284312
skipping SAM header line:   @SQ SN:chr15    LN:103494974
skipping SAM header line:   @SQ SN:chr16    LN:98319150
skipping SAM header line:   @SQ SN:chr17    LN:95272651
skipping SAM header line:   @SQ SN:chr18    LN:90772031
skipping SAM header line:   @SQ SN:chr19    LN:61342430
skipping SAM header line:   @SQ SN:chrY LN:15902555
skipping SAM header line:   @SQ SN:chrM LN:16299
skipping SAM header line:   @PG ID:samtools PN:samtools VN:1.14 CL:/projects/li-lab/yang/anaconda3/envs/py39/bin/samtools view -h 542VTF_AzAc_12.sorted.bam
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1595, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1598, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1601, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1604, <IN> line 25.
Use of uninitialized value $read_conversion in concatenation (.) or string at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1608, <IN> line 25.
Use of uninitialized value $genome_conversion in concatenation (.) or string at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1608, <IN> line 25.
Unexpected combination of read and genome conversion: '' / ''

Would you please help me to figure out the problem? Thank you!

FelixKrueger commented 2 years ago

So your BAM file is a single-end file, and you sorted it by position prior to methylation extraction? This should work for single-end files, but is not recommended for paired-end data. Having said that, is there a chance that the BAM file has been perturbed somehow? Which version of Bismark are you using?

If you wanted to find out whether the sorting process somehow messed up the file, you could try to use the (deduplicated) BAM file produced by bismark (or deduplicate_bismark) itself.

Just FYI, if you just want to get a coverage file (and bedGraph file), --genome_folder /fastscratch/chenx/share_yang is not needed.

liuyang2006 commented 2 years ago

Dear Felix, My BAM file is single-end file. The BAM file is not pertubed anymore. Version of bismark is:

bismark --version

          Bismark - Bisulfite Mapper and Methylation Caller.

                       Bismark Version: v0.23.1dev
        Copyright 2010-21 Felix Krueger, Babraham Bioinformatics
              www.bioinformatics.babraham.ac.uk/projects/
                https://github.com/FelixKrueger/Bismark

Even I remove options --genome_folder /fastscratch/chenx/share_yang, use command bismark_methylation_extractor -s -o bismark_methylation_extractor_mm9 --gzip --bedGraph 542VTF_AzAc_12.sorted.bam, I still got same error message:

Now reading in Bismark result file 542VTF_AzAc_12.sorted.bam
skipping SAM header line:   @HD VN:1.4  SO:coordinate
skipping SAM header line:   @SQ SN:chr1 LN:197195432
skipping SAM header line:   @SQ SN:chr2 LN:181748087
skipping SAM header line:   @SQ SN:chrX LN:166650296
skipping SAM header line:   @SQ SN:chr3 LN:159599783
skipping SAM header line:   @SQ SN:chr4 LN:155630120
skipping SAM header line:   @SQ SN:chr5 LN:152537259
skipping SAM header line:   @SQ SN:chr7 LN:152524553
skipping SAM header line:   @SQ SN:chr6 LN:149517037
skipping SAM header line:   @SQ SN:chr8 LN:131738871
skipping SAM header line:   @SQ SN:chr10    LN:129993255
skipping SAM header line:   @SQ SN:chr14    LN:125194864
skipping SAM header line:   @SQ SN:chr9 LN:124076172
skipping SAM header line:   @SQ SN:chr11    LN:121843856
skipping SAM header line:   @SQ SN:chr12    LN:121257530
skipping SAM header line:   @SQ SN:chr13    LN:120284312
skipping SAM header line:   @SQ SN:chr15    LN:103494974
skipping SAM header line:   @SQ SN:chr16    LN:98319150
skipping SAM header line:   @SQ SN:chr17    LN:95272651
skipping SAM header line:   @SQ SN:chr18    LN:90772031
skipping SAM header line:   @SQ SN:chr19    LN:61342430
skipping SAM header line:   @SQ SN:chrY LN:15902555
skipping SAM header line:   @SQ SN:chrM LN:16299
skipping SAM header line:   @PG ID:samtools PN:samtools VN:1.14 CL:/projects/li-lab/yang/anaconda3/envs/py39/bin/samtools view -h 542VTF_AzAc_12.sorted.bam
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1595, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1598, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1601, <IN> line 25.
Use of uninitialized value $read_conversion in string eq at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1604, <IN> line 25.
Use of uninitialized value $read_conversion in concatenation (.) or string at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1608, <IN> line 25.
Use of uninitialized value $genome_conversion in concatenation (.) or string at /projects/li-lab/yang/tools/Bismark/bismark_methylation_extractor line 1608, <IN> line 25.
Unexpected combination of read and genome conversion: '' / ''
FelixKrueger commented 2 years ago

Hmm, the file header doesn't seem to include the Bismark @PG header lines, so I wouldn't rule out that something else has happened as well.

Could you show the first lines of that file, e.g. with: samtools view -h 542VTF_AzAc_12.sorted.bam | head -30. Alternatively, would you be able to send me a snippet of that file, e.g. via email? You could use the following command to generate a small subset:

samtools view -h 542VTF_AzAc_12.sorted.bam | head -1000 | samtools view -bS - > small_test.bam
liuyang2006 commented 2 years ago

Please check this file: https://storage.googleapis.com/jax-nanopore-01-project-data/related_project/small_test.bam

FelixKrueger commented 2 years ago

Thanks for the file. Here are the first lines:

@PG ID:samtools PN:samtools VN:1.14 CL:samtools view -h 542VTF_AzAc_12.sorted.bam
@PG ID:samtools.1   PN:samtools PP:samtools VN:1.14 CL:samtools view -bS -
@PG ID:samtools.2   PN:samtools PP:samtools.1   VN:1.11 CL:samtools view -h small_test.bam
HWI-ST986:391:C723TACXX:1:1103:15119:49651 1:N:0:TGACCA 16  chr1    3010874 255 51M *   0   0   TCAAAACCAAAATAACTCCTACGAAACCTAAAACAAAAACCTCTTAAACCG CCCFFFFFHHHHFIJJJJHIJJJJJIJJJJIJJJJJHIIJJJJJJHIJJJJ NM:i:17 MD:Z:T1AA6AA1AA5A2A1A3A1AA2A9AAA3   XM:Z:Z..hhh.........x..hh.x...h.xZ.x.....hh.hh......xz..

As I had guessed, this BAM file is missing the read conversion (XR:Z:) and genome conversion (XG:Z:) optional fields, which are required to figure out where the read came from. So yes, whatever process interfered with the Bismark BAM file rendered it useless for the methylation extraction.

FelixKrueger commented 2 years ago

An easy solution would of course be to just use the (deduplicated) Bismark BAM file for the methylation extraction/coverage file generation, and use the sorted BAM file for whatever else you want to use it for.

liuyang2006 commented 2 years ago

Redo analysis on raw fastq files solved the problem!

FelixKrueger commented 2 years ago

Excellent!