FelixKrueger / Bismark

A tool to map bisulfite converted sequence reads and determine cytosine methylation states
http://felixkrueger.github.io/Bismark/
GNU General Public License v3.0
386 stars 101 forks source link

Bismark alignment error #481

Closed hidvegin closed 2 years ago

hidvegin commented 2 years ago

I would like to align 150 bp paired-end reads with Bismark v0.23.1. I use a HPC cluster with SLURM. I got this error in align process:

Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.4.5])
Output format is BAM (default)
Alignments will be written out in BAM format. Samtools found here: '/home/fk8jybr/samtools-1.14/samtools'
Reference genome folder provided is /home/fk8jybr/input/reference/triticum_aestivum/52/ (absolute path is '/big/home/fk8jybr/input/reference/triticum_aestivum/52/)'
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.1.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.2.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.3.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.4.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.1.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the C->T converted genome seems to be faulty or non-existant ('BS_CT.rev.2.bt2'). Please run the bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.1.bt2'). Please run bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.2.bt2'). Please run bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.3.bt2'). Please run bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.4.bt2'). Please run bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.rev.1.bt2'). Please run bismark_genome_preparation before running Bismark
The Bowtie 2 index of the G->A converted genome seems to be faulty or non-existant ('BS_GA.rev.2.bt2'). Please run bismark_genome_preparation before running Bismark

Couldn't find a traditional small Bowtie 2 index for the genome specified (ending in .bt2). Now searching for a large index instead (64-bit index ending in .bt2l)...
64-bit large genome Bowtie 2 index found...
FastQ format assumed (by default)

Input files to be analysed (in current folder '/big/home/fk8jybr'):
/home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz
/home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz
Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!)
Output will be written into the directory: /big/home/fk8jybr/input/Novogene_Buza_WGBS_2021/align/BGI/

Using temp directory: /home/fk8jybr/bismark_temp
Temporary files will be written into the directory: /big/home/fk8jybr/bismark_temp/
Using the following prefix for output files: K1

Summary of all aligner options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000
Running Bismark Parallel version. Number of parallel instances to be spawned: 4

Current working directory is: /big/home/fk8jybr

Now reading in and storing sequence information of the genome specified in: /big/home/fk8jybr/input/reference/triticum_aestivum/52/

chr 1A (594102056 bp)
chr 1B (689851870 bp)
chr 1D (495453186 bp)
chr 2A (780798557 bp)
chr 2B (801256715 bp)
chr 2D (651852609 bp)
chr 3A (750843639 bp)
chr 3B (830829764 bp)
chr 3D (615552423 bp)
chr 4A (744588157 bp)
chr 4B (673617499 bp)
chr 4D (509857067 bp)
chr 5A (709773743 bp)
chr 5B (713149757 bp)
chr 5D (566080677 bp)
chr 6A (618079260 bp)
chr 6B (720988478 bp)
chr 6D (473592718 bp)
chr 7A (736706236 bp)
chr 7B (750620385 bp)
chr 7D (638686055 bp)
chr Un (480980714 bp)

Paired-end alignments will be performed
=======================================

The provided filenames for paired-end alignments are /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz and /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz

Paired-end alignments will be performed
=======================================

The provided filenames for paired-end alignments are /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz and /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz

Paired-end alignments will be performed
=======================================

The provided filenames for paired-end alignments are /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz and /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz

Paired-end alignments will be performed
=======================================

The provided filenames for paired-end alignments are /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz and /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz for PID: 243138 and offset 4 (sequences written out: 82215442)

Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz for PID: 0 and offset 1 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.1.gz< as new in-file 1 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz<)
Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.4.gz< as new in-file 1 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz<)
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz for PID: 0 and offset 2 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.2.gz< as new in-file 1 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz<)
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz for PID: 0 and offset 3 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.3.gz< as new in-file 1 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R1.fq.gz<)
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz for PID: 0 and offset 2 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.2.gz< as new in-file 2 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz<)
Input files are in FastQ format
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz for PID: 243138 and offset 4 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.4.gz< as new in-file 2 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz<)
Input files are in FastQ format
Writing a C -> T converted version of the input file K1_R1.fq.gz.temp.2.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.2.gz_C_to_T.fastq.gz
Writing a C -> T converted version of the input file K1_R1.fq.gz.temp.4.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.4.gz_C_to_T.fastq.gz
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz for PID: 0 and offset 1 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.1.gz< as new in-file 2 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz<)
Input files are in FastQ format
Writing a C -> T converted version of the input file K1_R1.fq.gz.temp.1.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.1.gz_C_to_T.fastq.gz
Finished subdividing /home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz for PID: 0 and offset 3 (sequences written out: 82215442)

Using the subset file >/big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.3.gz< as new in-file 2 (instead of >/home/fk8jybr/input/WGBS_2021/K1/K1_R2.fq.gz<)
Input files are in FastQ format
Writing a C -> T converted version of the input file K1_R1.fq.gz.temp.3.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.3.gz_C_to_T.fastq.gz

Created C -> T converted version of the FastQ file K1_R1.fq.gz.temp.2.gz (82215442 sequences in total)

Writing a G -> A converted version of the input file K1_R2.fq.gz.temp.2.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.2.gz_G_to_A.fastq.gz

Created C -> T converted version of the FastQ file K1_R1.fq.gz.temp.4.gz (82215442 sequences in total)

Writing a G -> A converted version of the input file K1_R2.fq.gz.temp.4.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.4.gz_G_to_A.fastq.gz

Created C -> T converted version of the FastQ file K1_R1.fq.gz.temp.1.gz (82215442 sequences in total)

Writing a G -> A converted version of the input file K1_R2.fq.gz.temp.1.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.1.gz_G_to_A.fastq.gz

Created C -> T converted version of the FastQ file K1_R1.fq.gz.temp.3.gz (82215442 sequences in total)

Writing a G -> A converted version of the input file K1_R2.fq.gz.temp.3.gz to /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.3.gz_G_to_A.fastq.gz

Created G -> A converted version of the FastQ file K1_R2.fq.gz.temp.2.gz (82215442 sequences in total)

Input files are K1.K1_R1.fq.gz.temp.2.gz_C_to_T.fastq.gz and K1.K1_R2.fq.gz.temp.2.gz_G_to_A.fastq.gz (FastQ)
Now running 2 instances of Bowtie 2 against the bisulfite genome of /big/home/fk8jybr/input/reference/triticum_aestivum/52/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000

Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.2.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.2.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --norc))
Found first alignment:
V300057111L1C001R0040000034/1/1 77  *   0   0   *   *   0   0   AGGGTAGAATTATATATGGTGTAGTGGGTGTAGTATTGTTGGTAGTGATGTTGTTGTTTATGTTGTTGGGGATGAGGGTGTTGAGGTTGGGGAAGAAGTTGTTGTTTGTGAAGTTGTTGTTGTTAGTAGTTATGTGATTGTGGTTTTTAA  FBDCAFFF<<FFGFFGFFCFG=FFCEFFEFF?FFAGEDFE+A?G(FFFD8FFG*G<E9GBGEFDF6FB+@FFG1D?G(CEG?69@F=GGFFFE;FF>F/G>-F;=DGCG*89FD@+2F5G6C5F-9FC<1G@FED@FG9A543F'=GFFC  YT:Z:UP
V300057111L1C001R0040000034/2/2 141 *   0   0   *   *   0   0   CACAACACACCACAAATTACAAAATTAAACCAAACCAAAAACAAAACTATACAAATCTTCTCCATCCCTCCTTCCAACATACACCATAAAAAAAAACAACAAACCTCCAAAACCCCAACTCTCATAATCCTATATAAAAAAAAAAACAAT  FDGG8EGGEFDGGEG5DFGFFG9FGFFEFGGF7FFFEF??FFFFFFFFFAGEDF=GFFGFFFFFFGGFGFFGGFGGFCEFGFFFAFE8GFFFAG>FDGFE:DFGGGGDG->DGCFEF.FGF2?A=?=CFFDG2FCEFE=FEFFFBFF>6(  YT:Z:UP
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.2.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.2.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --nofw))
(ERR): bowtie2-align died with signal 9 (KILL) 
Found first alignment:
V300057111L1C001R0040000034/1/1 83  3A_GA_converted 151009279   3   150M    =   151009123   -306    TTAAAAACCACAATCACATAACTACTAACAACAACAACTTCACAAACAACAACTTCTTCCCCAACCTCAACACCCTCATCCCCAACAACATAAACAACAACATCACTACCAACAATACTACACCCACTACACCATATATAATTCTACCCT  CFFG='F345A9GF@DEF@G1<CF9-F5C6G5F2+@DF98*GCGD=;F->G/F>FF;EFFFGG=F@96?GEC(G?D1GFF@+BF6FDFEGBG9E<G*GFF8DFFF(G?A+EFDEGAFF?FFEFFECFF=GFCFFGFFGFF<<FFFACDBF  AS:i:-24    XN:i:0  XM:i:4  XO:i:0  XG:i:0  NM:i:4  MD:Z:7A64A8T23A44   YS:i:-14    YT:Z:CP
V300057111L1C001R0040000034/2/2 163 3A_GA_converted 151009123   3   135M1I14M   =   151009279   306 CACAACACACCACAAATTACAAAATTAAACCAAACCAAAAACAAAACTATACAAATCTTCTCCATCCCTCCTTCCAACATACACCATAAAAAAAAACAACAAACCTCCAAAACCCCAACTCTCATAATCCTATATAAAAAAAAAAACAAT  FDGG8EGGEFDGGEG5DFGFFG9FGFFEFGGF7FFFEF??FFFFFFFFFAGEDF=GFFGFFFFFFGGFGFFGGFGGFCEFGFFFAFE8GFFFAG>FDGFE:DFGGGGDG->DGCFEF.FGF2?A=?=CFFDG2FCEFE=FEFFFBFF>6(  AS:i:-14    XN:i:0  XM:i:1  XO:i:1  XG:i:1  NM:i:2  MD:Z:117T31 YS:i:-24    YT:Z:CP

>>> Writing bisulfite mapping results to K1.K1_R1.fq.gz.temp.2.gz_bismark_bt2_pe.bam <<<

Unmapped sequences will be written to K1.K1_R1.fq.gz.temp.2.gz_unmapped_reads_1.fq and K1.K1_R2.fq.gz.temp.2.gz_unmapped_reads_2.fq

Reading in the sequence files /big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.2.gz and /big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.2.gz
Use of uninitialized value $alignment_score_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 356.
Use of uninitialized value $MD_tag_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 356.
Failed to extract alignment score 2 () and MD tag ()!
last alignment 1: V300057111L1C001R0040010348/1/1   99  6A_CT_converted 503136061   40  150M    =   503136100   189 TATAGTTGGTTTTGTAGTAGGTTTGTTTTAGTTGTTTTAATTGTTTTTTTGAATTTTATTATGTGTAGGAGTAGGTGGAGGAGGTGGGATATTGTTTTATTTTAAAGTAGGGAATGGAGGTAGATTAAAAGGTGGGATTTTTGGAAAAGG  FFDFFFEBE@FFFC3FEFCBCCFF7E>FFFEC9F?,=FFFFFFEEBDE7FEEFFDBFFF6FFFF7FFFFDFFFFBFEE7FFFDEFE;%FFFFG5D7F8FCF=EFCFF=FEE?FFFF8EEEGF7EDDFFFF@E@EE-FCEEFEF1FFFF<F  AS:i:-12    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:87T47T14   YS:i:-6 YT:Z:CP
last alignment 2: V300057111L1C001R0040010348/2/2   147 6A_CT_converted 503136100   40  150M    =   503136061   -189    ATTGTTTTTTTGTATTTTATTATGTGTAGGAGTAGGTGGAGGAGGTGGTATATTGTTTTATTTTAAAGTAGGGAATGGAGGTAGATTAAAAGGTGGTATTTTTGGAAAAGGTATTTTAGGTAGAGGATGAGGATTGTTATTGATGGGTAT  F;CFFF/EFCEF+FFDFFFDFFFFF4GGFFFFGFFAFDGFFFF;FFFFF<EFFFFFDFAFFFFFFFFFFEFGFFFEFFGFDFFFFEFFFFFFFFFFGFE:@FFDFFF;FFFFGFEFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

Created G -> A converted version of the FastQ file K1_R2.fq.gz.temp.1.gz (82215442 sequences in total)

Input files are K1.K1_R1.fq.gz.temp.1.gz_C_to_T.fastq.gz and K1.K1_R2.fq.gz.temp.1.gz_G_to_A.fastq.gz (FastQ)
Now running 2 instances of Bowtie 2 against the bisulfite genome of /big/home/fk8jybr/input/reference/triticum_aestivum/52/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000

Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.1.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.1.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --norc))

Created G -> A converted version of the FastQ file K1_R2.fq.gz.temp.3.gz (82215442 sequences in total)

Input files are K1.K1_R1.fq.gz.temp.3.gz_C_to_T.fastq.gz and K1.K1_R2.fq.gz.temp.3.gz_G_to_A.fastq.gz (FastQ)
Now running 2 instances of Bowtie 2 against the bisulfite genome of /big/home/fk8jybr/input/reference/triticum_aestivum/52/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000

Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.3.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.3.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --norc))
Found first alignment:
V300057111L1C001R0040000020/1/1 77  *   0   0   *   *   0   0   TTTTGAATAGAGGAGTTTATGGTAATTTTTGTGGGGGAAGTATAAAGTAGTTTTTTAAAGTAATTTTTAAGTTTGGTTGTTAAGTGTTTGGATATTGTTAAGTGTGTTTGGTGTAATTTATAATAAGATTTTTTGGTATGTTTTGAGTTA  FA;FEBFEF@EFEDD<F@FF>EDFFB>F9F?FC8C?C:@F=FEFEB84FF1<F9C4EFEFEFFFDE.EFCF'@FCEA1F&FF:FFFCFFFEFFFFFDE9F4FFD>0EFE.CFFEFCF2EFE>FEFDAEFC(9FFF34BF1<EBFF8FBDF  YT:Z:UP
V300057111L1C001R0040000020/2/2 141 *   0   0   *   *   0   0   AAAATCACAAAAATCAAAAACCCCTAATATAATTAAACTACATTCTAAACATACCATAATTATACCCCTTCCCTTATACCCATAATATTTCCAAAACATAATTATATTCACATAATACTAATTTCACAATCTCACAAAAACTAAAACTAA  F@D5F?FEEFC8=F:FD72FEEEEFEEFFFFE=EFDD:EFCFDEBEBEECEFFFEFEFFF=FEE<@AEFEFABEAFEBEEFEFF>D;FEFFF?F':DF7FF>4-EFD'FFBFAB<B8FG<EFEFFF8CD91E>F)F7E?FEFFDEDFEBE  YT:Z:UP
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.1.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.1.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --nofw))

Created G -> A converted version of the FastQ file K1_R2.fq.gz.temp.4.gz (82215442 sequences in total)

Input files are K1.K1_R1.fq.gz.temp.4.gz_C_to_T.fastq.gz and K1.K1_R2.fq.gz.temp.4.gz_G_to_A.fastq.gz (FastQ)
Now running 2 instances of Bowtie 2 against the bisulfite genome of /big/home/fk8jybr/input/reference/triticum_aestivum/52/ with the specified options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000

Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.4.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.4.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --norc))
Found first alignment:
V300057111L1C001R0040000039/1/1 99  2B_CT_converted 383835911   23  150M    =   383835963   202 GAAAGTATTTTTGGATATATTTATAATGATATAATGTTATAGTTATTTAGATAGGGTGTTTGAGTTATTTTTTATTTTATGTAAGTAGGGGGGATGTATGTAATGATATTTTGGTTATATATGTATGTTGGTATGGTTTATTATAAGTTG  A<FCFDFF;B=FFF3BEFFDCA'FFF?FBFFFFFFF=9F=FE4:=>6?C;FCFFC>;F2BAB0FD@F?FF3ECFF=EBFF8EFEFEDEEC,EA?CF=EF><=A(6@*E&?F3FF79DEFBB;=DFFFDC.1<ECF*BBBFECFDF@E3D'  AS:i:-24    XN:i:0  XM:i:4  XO:i:0  XG:i:0  NM:i:4  MD:Z:22T67T44A11A2  YS:i:0  YT:Z:CP
V300057111L1C001R0040000039/2/2 147 2B_CT_converted 383835963   23  150M    =   383835911   -202    AGGGTGTTTGAGTTATTTTTTATTTTATGTAAGTAGGGTGGATGTATGTAATGATATTTTGGTTATATATGTATGTTGGTATGATTTATTATAAGATGGATTAGTGAATTTATTTTTTGGTTAAATGATAATTGTTGTTGTTGATAAAAA  FF8E>FEDFFFFFFFF=DE9EEFE@>?E>D=;F>5FF?>DF<.FB5FF=EC;FFEDFEF6:E<EF6FD3FFFFF>:FFBD:EFFFFD<FEF>EFF=FFE9>FFFFFEF;-D=FFDEEFABF7F;FFFFFEFFDFEFFFFFE>FF5FFECF  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:150    YS:i:-24    YT:Z:CP
Found first alignment:
V300057111L1C001R0040000154/1/1 99  1A_CT_converted 389812086   0   150M    =   389812094   158 TATTATGTTTTATATTTTATTTTTTTAGAATGATGGTAGTTTAAATAAGATGATATTTTATTAAAATTTTAAGAAAAGTGTTTTTTTTAATTGTGTATTTTTGTGAAGGTTTGTTGTTTTGAAGTATTATGTGATGATTGGGTGTGATAG  EEGEEEGGFFFEEEFBGFECFFGFGFGEDFBGDEDF=8GFFE>FFGEEGGEFGEEECGGBGEEAFFBFCFEBFFGGEGEGFFFFFGGFE@EEBEEEFFGGEEFDDGGFFFFEDFB1CGED3FD?GB@GEE;ABFEEFCE/>>GD2EG@C4  AS:i:-12    XS:i:-12    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:22G76G50   YS:i:-12    YT:Z:CP
V300057111L1C001R0040000154/2/2 147 1A_CT_converted 389812094   0   150M    =   389812086   -158    TTTATATTTTATTTTTTTAGAATGATGGTAGTTTAAATAAGATGATATTTTATTAAAATTTTAAGAAAAGTGTTTTTTTTAATTGTGTATTTTTGTGAAGGTTTGTTGTTTTGAAGTATTATGTGATGATTGGGTGTGATAGATTTTAAT  FDCF9FEFE?EFFGG?EGE@?;F@:EEGAEF3F:EEF0FECDG<GEEFCG?GFCEEG<FDGDFFEGFFEGEGEF@GBFFCBBEDGCFEGFFGGDGAGFEGEGDDF@DDFG4DEEEFEG8FCEFFGEGDE/EBBGFECGFGEDEFFFFGGF  AS:i:-12    XS:i:-12    XN:i:0  XM:i:2  XO:i:0  XG:i:0  NM:i:2  MD:Z:14G76G58   YS:i:-12    YT:Z:CP
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.3.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.3.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --nofw))
Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from /big/home/fk8jybr/bismark_temp/K1.K1_R1.fq.gz.temp.4.gz_C_to_T.fastq.gz and /big/home/fk8jybr/bismark_temp/K1.K1_R2.fq.gz.temp.4.gz_G_to_A.fastq.gz, with the options: -q --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --maxins 1000 --nofw))
(ERR): bowtie2-align died with signal 9 (KILL) 
Found first alignment:
V300057111L1C001R0040000020/1/1 83  7B_GA_converted 720965141   40  150M    =   720964975   -316    TAACTCAAAACATACCAAAAAATCTTATTATAAATTACACCAAACACACTTAACAATATCCAAACACTTAACAACCAAACTTAAAAATTACTTTAAAAAACTACTTTATACTTCCCCCACAAAAATTACCATAAACTCCTCTATTCAAAA  FDBF8FFBE<1FB43FFF9(CFEADFEF>EFE2FCFEFFC.EFE0>DFF4F9EDFFFFFEFFFCFFF:FF&F1AECF@'FCFE.EDFFFEFEFE4C9F<1FF48BEFEF=F@:C?C8CF?F9F>BFFDE>FF@F<DDEFE@FEFBEF;AF  AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:19C130 YS:i:-6 YT:Z:CP
V300057111L1C001R0040000020/2/2 163 7B_GA_converted 720964975   40  150M    =   720965141   316 AAAATCACAAAAATCAAAAACCCCTAATATAATTAAACTACATTCTAAACATACCATAATTATACCCCTTCCCTTATACCCATAATATTTCCAAAACATAATTATATTCACATAATACTAATTTCACAATCTCACAAAAACTAAAACTAA  F@D5F?FEEFC8=F:FD72FEEEEFEEFFFFE=EFDD:EFCFDEBEBEECEFFFEFEFFF=FEE<@AEFEFABEAFEBEEFEFF>D;FEFFF?F':DF7FF>4-EFD'FFBFAB<B8FG<EFEFFF8CD91E>F)F7E?FEFFDEDFEBE  AS:i:-6 XN:i:0  XM:i:1  XO:i:0  XG:i:0  NM:i:1  MD:Z:107A42 YS:i:-6 YT:Z:CP

>>> Writing bisulfite mapping results to K1.K1_R1.fq.gz.temp.1.gz_bismark_bt2_pe.bam <<<

Unmapped sequences will be written to K1.K1_R1.fq.gz.temp.1.gz_unmapped_reads_1.fq and K1.K1_R2.fq.gz.temp.1.gz_unmapped_reads_2.fq

Reading in the sequence files /big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.1.gz and /big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.1.gz
(ERR): bowtie2-align died with signal 9 (KILL) 
Use of uninitialized value $alignment_score_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 11512.
Use of uninitialized value $MD_tag_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 11512.
Failed to extract alignment score 2 () and MD tag ()!
last alignment 1: V300057111L1C001R0040291826/1/1   83  7D_GA_converted 286109599   32  150M    =   286109462   -287    TAACTTAAATCTTCCAACTCTCCACTTCATATAATATAACATTATAAATTCCAATAACACTTAATATTCCTATATTCAAATACTTAATAACTTCCTTCAAATATTAACCAAACAACATATCTTCATCTAATTACATAAACTTACTCCTTA  EEDE@FFEFGGFEF;FED6>E>GGG:FFEEFGC<FGE?E/FFFGFGGFF=FADFGFFFE<8FABE6GFEED?FCF;EGGFCF4@G<=G=2G;CFGE=GG@FGGFGAFG@GF=GAFEFGF?D>FEEFGAFFGCFEGEDG)7EGF<F@GF:F  AS:i:0  XS:i:-24    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:150    YS:i:0  YT:Z:CP
last alignment 2: V300057111L1C001R0040291826/2/2   163 7D_GA_converted 286109462   32  150M    =   286109599   287 AAACTCCTAATAATAAACTTCTCCAATACATCCCATAACAATATATACAACCATACTCTTCCCTAAACTCCAAATTAATACTTC
(ERR): bowtie2-align died with signal 9 (KILL) 
(ERR): bowtie2-align died with signal 9 (KILL) 
Found first alignment:
V300057111L1C001R0040000154/1/1 83  1A_GA_converted 498606860   37  150M    =   498606852   -158    CTATCACACCCAATCATCACATAATACTTCAAAACAACAAACCTTCACAAAAATACACAATTAAAAAAAACACTTTTCTTAAAATTTTAATAAAATATCATCTTATTTAAACTACCATCATTCTAAAAAAATAAAATATAAAACATAATA  4C@GE2DG>>/ECFEEFBA;EEG@BG?DF3DEGC1BFDEFFFFGGDDFEEGGFFEEEBEE@EFGGFFFFFGEGEGGFFBEFCFBFFAEEGBGGCEEEGFEGGEEGFF>EFFG8=FDEDGBFDEGFGFGFFCEFGBFEEEFFFGGEEEGEE  AS:i:0  XS:i:-24    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:150    YS:i:0  YT:Z:CP
V300057111L1C001R0040000154/2/2 163 1A_GA_converted 498606852   37  150M    =   498606860   158 ATTAAAATCTATCACACCCAATCATCACATAATACTTCAAAACAACAAACCTTCACAAAAATACACAATTAAAAAAAACACTTTTCTTAAAATTTTAATAAAATATCATCTTATTTAAACTACCATCATTCTAAAAAAATAAAATATAAA  FGGFFFFEDEGFGCEFGBBE/EDGEGFFECF8GEFEEED4GFDD@FDDGEGEFGAGDGGFFGEFCGDEBBCFFBG@FEGEGEFFGEFFDGDF<GEECFG?GCFEEG<GDCEF0FEE:F3FEAGEE:@F;?@EGE?GGFFE?EFEF9FCDF  AS:i:0  XS:i:-24    XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:150    YS:i:0  YT:Z:CP

>>> Writing bisulfite mapping results to K1.K1_R1.fq.gz.temp.4.gz_bismark_bt2_pe.bam <<<

Found first alignment:
V300057111L1C001R0040000039/1/1 77  *   0   0   *   *   0   0   GAAAGTATTTTTGGATATATTTATAATGATATAATGTTATAGTTATTTAGATAGGGTGTTTGAGTTATTTTTTATTTTATGTAAGTAGGGGGGATGTATGTAATGATATTTTGGTTATATATGTATGTTGGTATGGTTTATTATAAGTTG  A<FCFDFF;B=FFF3BEFFDCA'FFF?FBFFFFFFF=9F=FE4:=>6?C;FCFFC>;F2BAB0FD@F?FF3ECFF=EBFF8EFEFEDEEC,EA?CF=EF><=A(6@*E&?F3FF79DEFBB;=DFFFDC.1<ECF*BBBFECFDF@E3D'  YT:Z:UP
V300057111L1C001R0040000039/2/2 141 *   0   0   *   *   0   0   TTTTTATCAACAACAACAATTATCATTTAACCAAAAAATAAATTCACTAATCCATCTTATAATAAATCATACCAACATACATATATAACCAAAATATCATTACATACATCCACCCTACTTACATAAAATAAAAAATAACTCAAACACCCT  FCEFF5FF>EFFFFFEFDFFEFFFFF;F7FBAFEEDFF=D-;FEFFFFF>9EFF=FFE>FEF<DFFFFE:DBFF:>FFFFF3DF6FE<E:6FEFDEFF;CE=FF5BF.<FD>?FF5>F;=D>E?>@EFEE9ED=FFFFFFFFDEF>E8FF  YT:Z:UP

>>> Writing bisulfite mapping results to K1.K1_R1.fq.gz.temp.3.gz_bismark_bt2_pe.bam <<<

Unmapped sequences will be written to K1.K1_R1.fq.gz.temp.4.gz_unmapped_reads_1.fq and K1.K1_R2.fq.gz.temp.4.gz_unmapped_reads_2.fq
Unmapped sequences will be written to K1.K1_R1.fq.gz.temp.3.gz_unmapped_reads_1.fq and K1.K1_R2.fq.gz.temp.3.gz_unmapped_reads_2.fq

Reading in the sequence files /big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.4.gz and /big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.4.gz
Use of uninitialized value $alignment_score_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 352.
Use of uninitialized value $MD_tag_2 in concatenation (.) or string at /home/fk8jybr/Bismark-0.23.1/bismark line 3281, <IN2> line 352.
Failed to extract alignment score 2 () and MD tag ()!
last alignment 1: V300057111L1C001R0040010315/1/1   99  7D_CT_converted 244392554   42  150M    =   244392661   257 AAGTTATTTTTTGGAGATTATTTTTTTTGTTATGTGTAATTTTGGTTGATGTTTGGTTTGTTGTGTTTAATATGTTTTTTGATGTATTTATATAGGATAATTTTGGGAATAATTGTTGTTTTTTATTGAGGTTAGTTTTATTTTATGGTT  FFFCFGFBFFFFG;>FFFFGFDFEC@FG>EFGFEFGEFFFFFEFFDFFFGDFFFFFFGFGFFFFFFFFFFFFFGFFFFFG<FFFEGEF?EFFFEFDFGFGFEFFFFFFFFFFGG?1FF;DFF@FGFE@FFACGFGFEGBFFFFFGEEFDF  AS:i:0  XN:i:0  XM:i:0  XO:i:0  XG:i:0  NM:i:0  MD:Z:150    YS:i:0  YT:Z:CP
last alignment 2: V300057111L1C001R0040010315/2/2   147 7D_CT_converted 244392661   42  150M    =   244392554   -257    AATAATTGTTGTTTTTTATTGAGGTTAGTTTTATTTTATGGT

Reading in the sequence files /big/home/fk8jybr/bismark_temp/K1_R1.fq.gz.temp.3.gz and /big/home/fk8jybr/bismark_temp/K1_R2.fq.gz.temp.3.gz
Processed 1000000 sequence pairs so far

I think this is a memory issue, example the bowtie run out of memory but I am not sure. Is it a memory issue?

FelixKrueger commented 2 years ago

Yes, I would agree:

(ERR): bowtie2-align died with signal 9 (KILL) 

Can you try to drop the --parallel 4 and tray again?

hidvegin commented 2 years ago

Thank you for your answer. I dropped the --multicore 4 and tried again. Now, I think the job is working but a little slow without the multicore.

FelixKrueger commented 2 years ago

Great to hear that it is working in general. Now you could monitor the core/RAM usage and adjust the --parallel according to the hardware resources you have available if desired.

hidvegin commented 2 years ago

Dear @FelixKrueger!

I have two and three different bam files from the same library. Should I use --multiple for deduplicate_bismark script or before deduplication I merge bam files with samtools?

Other question. I would like to analyse CpG, CHG and CHH context separately in SeqMonk. So, I would like to generate CpG, CHG and CHH coverage file separately with bismark_methylation_extractor. I used this parameters:

bismark_methylation_extractor --gzip --bedGraph --comprehensive --buffer_size 10G K1_aligned.deduplicated.bam

I got three txt.gz file (CpG, CHG and CHH), but only one coverage file from CpGs. What should I change in the parameters for I get three different coverage files about CpG, CHG and CHH contexts separately?

FelixKrueger commented 2 years ago

For the merging, either should work. In fact, deduplicate_bismark does use samtools cat internally when you use the --multiple option.

Regarding the second question: you can simply run bismark2bedGraph three times, only feeding it the context of interest. Since you have already generated the CpG coverage file, just move to the file containing the CHG and CHH files from the methylation extractor and run:

bismark2bedGraph --CX -o CHH_context.cov.gz CHH*
bismark2bedGraph --CX -o CHG_context.cov.gz CHG*

As a word of warning, there might be a LOT of cytosines in CHH context, so you might struggle somewhat with the downstream analysis. But it's certainly worth a shot!

hidvegin commented 2 years ago

Thank you for your answer and help.

hidvegin commented 2 years ago

I have got a new error in Bismarck deduplication step. I used this script for deduplication:

deduplicate_bismark --bam home/fk8jybr/output/bismark/align/BGI/11/11.11_R1_paired_bismark_bt2_pe.bam -o 11 -output_dir /home/fk8jybr/output/bismark/deduplication/BGI/11

I got this error:

Output will be written into the directory: /big/home/fk8jybr/output/bismark/deduplication/BGI/11/
Output filename was given as: 11

Neither -s (single-end) nor -p (paired-end) selected for deduplication. Trying to extract this information for each file separately from the @PG line of the SAM/BAM file
Processing single-end Bismark output file(s) (SAM format):
home/fk8jybr/output/bismark/align/BGI/11/11.11_R1_paired_bismark_bt2_pe.bam

If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads.

Checking file >>home/fk8jybr/output/bismark/align/BGI/11/11.11_R1_paired_bismark_bt2_pe.bam<< for signs of file truncation...
Captured error message: '[E::hts_open_format] Failed to open file "home/fk8jybr/output/bismark/align/BGI/11/11.11_R1_paired_bismark_bt2_pe.bam" : No such file or directory'

[ERROR] The file appears to be truncated, please ensure that there were no errors while copying the file!!! Exiting...

I think the file is not corrupted. Bismark finished the alignment step in correctly. I do not know why the file be truncated. What should I do with this error?

FelixKrueger commented 2 years ago

[E::hts_open_format] Failed to open file "home/fk8jybr/output/bismark/align/BGI/11/11.11_R1_paired_bismark_bt2_pe.bam" : No such file or directory'

This would indicate that you spelled the name of the file, or the path to it, incorrectly. Is that a possibility, e.g. forgot / before home/...?

hidvegin commented 2 years ago

Thank @FelixKrueger for your answer and help. Yes, this was the problem. I forgot the / before home path.

hidvegin commented 2 years ago

I got an other error in the bedGraph phase. I used this script:

/home/fk8jybr/Bismark-0.23.1/bismark2bedGraph -o CpG_context.cov.gz --dir /home/fk8jybr/output/bismark/bedGraph/BGI/11 --buffer_size 70% /home/fk8jybr/output/bismark/methylation_extratction/BGI/11/CpG_context_11.deduplicated.txt.gz

The error was this:

Using these input files: /home/fk8jybr/output/bismark/methylation_extratction/BGI/11/CpG_context_11.deduplicated.txt.gz

Summary of parameters for bismark2bedGraph conversion:
======================================================
bedGraph output:        CpG_context.cov.gz
output directory:       >/big/home/fk8jybr/output/bismark/bedGraph/BGI/11/<
remove whitespaces:     no
CX context:         no (CpG context only, default)
No-header selected:     no
Sorting method:         Unix sort-based (smaller memory footprint, but slower)
Sort buffer size:       70%
Coverage threshold:     1
=============================================================================
Methylation information will now be written into a bedGraph and coverage file
=============================================================================

Using the following files as Input:
/big/home/fk8jybr/output/bismark/methylation_extratction/BGI/11/CpG_context_11.deduplicated.txt.gz

Writing bedGraph to file: CpG_context.cov.gz
Also writing out a coverage file including counts methylated and unmethylated residues to file: CpG_context.cov.gz.bismark.cov.gz

Changed directory to /big/home/fk8jybr/output/bismark/bedGraph/BGI/11/
Now writing methylation information for file >>CpG_context_11.deduplicated.txt.gz<< to individual files for each chromosome
Finished writing out individual chromosome files for CpG_context_11.deduplicated.txt.gz

Collecting temporary chromosome file information... Processing the following input file(s):
CpG_context_11.deduplicated.txt.gz.chr7B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr1B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr5A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr6D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr3B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr6B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr2B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr2A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr4A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr4D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr7D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr1A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr4B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr3A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr5B.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr6A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr3D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr2D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chrUn.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr5D.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr7A.methXtractor.temp
CpG_context_11.deduplicated.txt.gz.chr1D.methXtractor.temp

Sorting input file CpG_context_11.deduplicated.txt.gz.chr1A.methXtractor.temp by positions (using -S of 70%)
Successfully deleted the temporary input file CpG_context_11.deduplicated.txt.gz.chr1A.methXtractor.temp

Sorting input file CpG_context_11.deduplicated.txt.gz.chr1B.methXtractor.temp by positions (using -S of 70%)
Missing alphabetical methylation call at /home/fk8jybr/Bismark-0.23.1/bismark2bedGraph line 562, <$ifh> line 60377208.
    main::validate_methylation_call('+', undef) called at /home/fk8jybr/Bismark-0.23.1/bismark2bedGraph line 466

What I missed it? How can I resolve this error?

FelixKrueger commented 2 years ago

Hmm, the error message seems to indicate that the methylation call information for at least one line for the file CpG_context_11.deduplicated.txt.gz.chr1B.methXtractor.temp is truncated... Did you get some kind of errors before, during the methylation call procedure? You could try to parse the input file to see if there is indeed something wrong.

Alternatively, instead of calling bismark2bedGraph on its own you could invoke it straight away during the methylation extraction using the flag --bed for the methylation extractor.

hidvegin commented 2 years ago

Thank @FelixKrueger for your answer. Seems to some kind of error in the CpG_context_11.deduplicated.txt.gz.chr1B.methXtractor.temp file. The error is not in the original file which generated from the methylation call procedure. I use a HPC cluster, which seems to generate the I/O error. The administrator restarted the HPC cluster and I re-run the bedGraph process. Now, I think the error is not exists anymore.

I would like to generate CpG, CHG and CHH context also, so I should use bismark2bedGraph, instead of --bed in methylation extractor.

FelixKrueger commented 2 years ago

yes, if you wanted that you could run bismark2bedGraph on the CpG, CHH and CHG output from the methylation extractor (you should already have the CpG output):

bismark2bedGraph --gzip --CX -o CHG_out.cov.gz CHG*
bismark2bedGraph --gzip --CX -o CHH_out.cov.gz CHH*