Closed desmodus1984 closed 8 months ago
I am afraid I don't think the BAM files are compatible :(, and you would indeed have to re-start from the alignment step...
Good morning,
I installed Bismark as mentioned. When I tried doing the genome indexing it failed
home/juaguila/appz/Bismark-0.24.2/bismark_genome_preparation ../Bismark/ Writing bisulfite genomes out into a single MFA (multi FastA) file
Bisulfite Genome Indexer version v0.24.2 (last modified: 19 May 2022)
Step I - Prepare genome folders - completed
Total number of conversions performed: C->T: 52215405 G->A: 52297904
Step II - Genome bisulfite conversions - completed
Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer
Please be aware that this process can - depending on genome size - take several hours!
/home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version GLIBCXX_3.4.32' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s) /home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version
GLIBCXX_3.4.20' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s)
/home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version CXXABI_1.3.8' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s) /home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version
GLIBCXX_3.4.26' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s)
/home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version GLIBCXX_3.4.30' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s) /home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version
GLIBCXX_3.4.22' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s)
/home/applications/bowtie2/2.5.3/bowtie2-build-s: /lib64/libstdc++.so.6: version `GLIBCXX_3.4.21' not found (required by /home/applications/bowtie2/2.5.3/bowtie2-build-s)
Traceback (most recent call last):
File "/home/applications/bowtie2/2.5.3/bowtie2-build", line 134, in
One option that I am doubtful is that bowtie2 is installed in the HPC as a module and I loaded. Do I have to figure out the path of bowtie2 or it is obtained automatically?
Thanks,
Juan Pablo Aguilar
From: Felix Krueger @.> Sent: Tuesday, February 20, 2024 5:12:21 AM To: FelixKrueger/Bismark @.> Cc: Aguilar Cabezas, Juan Pablo @.>; Author @.> Subject: [External] Re: [FelixKrueger/Bismark] Generating cytosine report (Issue #655)
Use caution with links and attachments.
I am afraid I don't think the BAM files are compatible :(, and you would indeed have to re-start from the alignment step...
— Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/655#issuecomment-1953881646, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWD2VJMA5CRM7P4JVOH74DYURZILAVCNFSM6AAAAABDP7LTP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJTHA4DCNRUGY. You are receiving this because you authored the thread.Message ID: @.***>
Hmm, this does indeed look like a Bowtie2 installation issue. If you don't specify the path yourself it will look in the PATH
environment variable. It seems to use it from /home/applications/bowtie2/2.5.3/bowtie2-build
.
You can find out the path with the command which bowtie2
.
When you run /home/applications/bowtie2/2.5.3/bowtie2-build --help
you should be able to see the help text. Otherwise, I would recommend you reintall Bowtie2 (or install Bismark using conda install bismark
, which will also satisfy all other requirements)
I did actually think about that but the version in conda is 0.22 while the last one is 0.24.
Since I was told to best use the latest version due to bugs and updates.
From: Felix Krueger @.> Sent: Wednesday, February 21, 2024 10:20:32 AM To: FelixKrueger/Bismark @.> Cc: Aguilar Cabezas, Juan Pablo @.>; Author @.> Subject: [External] Re: [FelixKrueger/Bismark] Generating cytosine report (Issue #655)
Use caution with links and attachments.
Hmm, this does indeed look like a Bowtie2 installation issue. If you don't specify the path yourself it will look in the PATH environment variable. It seems to use it from /home/applications/bowtie2/2.5.3/bowtie2-build.
You can find out the path with the command which bowtie2.
When you run /home/applications/bowtie2/2.5.3/bowtie2-build --help you should be able to see the help text. Otherwise, I would recommend you reintall Bowtie2 (or install Bismark using conda install bismark, which will also satisfy all other requirements)
— Reply to this email directly, view it on GitHubhttps://github.com/FelixKrueger/Bismark/issues/655#issuecomment-1956921278, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AJWD2VIZAZWLUMOZO6BR2HTYUYGEBAVCNFSM6AAAAABDP7LTP2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNJWHEZDCMRXHA. You are receiving this because you authored the thread.Message ID: @.***>
The latest version on bioconda is 0.24.2, the instructions are here: https://bioconda.github.io/recipes/bismark/README.html
Hi,
I Installed Bismark in conda, and I am trying to run the workflow and I would appreciate if you could check my code, and tell me what I am writing wrong or missing.
I loaded the environment, and I am running the code in the folder, as an .sh file, and it is not creating any bam output. Here is my code which worked with bs_seeker2.
I start with a loop to grab the names of the files, but then alignment doesn't happen.
`for i in *_R1_val_1.fq.gz do base=$(basename $i "_R1_val_1.fq.gz")
bismark -q --parallel 8 -B ${base} \
--genome /DataDrive/juaguila/Bvos-trimmed/BM-lot/ \
-1 ${base}_R1_val_1.fq.gz -2 ${base}_R2_val_2.fq.gz
deduplicate_bismark --bam ${base}.bam
bismark_methylation_extractor -p --gzip --genome_folder /DataDrive/juaguila/Bvos-trimmed/BM-lot/ \
--cytosine_report --multicore 8 ${base}.deduplicated.bam
done
`
Which I run this: nohup sh BS2.sh > test.bismark.log
The genome was created in the genome directory.
The log is as follows:
Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.4.1]) Output format is BAM (default) Alignments will be written out in BAM format. Samtools found here: '/home/juaguila/miniconda3/envs/bismark/bin/samtools' Reference genome folder provided is /DataDrive/juaguila/Bvos-trimmed/BM-lot/ (absolute path is '/DataDrive/juaguila/Bvos-trimmed/BM-lot/)' FastQ format specified
Input files to be analysed (in current folder '/DataDrive/juaguila/Bvos-trimmed/BM-lot'): V00001_R1_val_1.fq.gz V00001_R2_val_2.fq.gz Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!) Specifying --basename in conjuction with --multicore is currently not supported (but we are aiming to fix this soon). Please lose either --basename or --multicore to proceed
Neither -s (single-end) nor -p (paired-end) selected for deduplication. Trying to extract this information for each file separately from the @PG line of the SAM/BAM file Processing single-end Bismark output file(s) (SAM format): V00001.bam
If there are several alignments to a single position in the genome the first alignment will be chosen. Since the input files are not in any way sorted this is a near-enough random selection of reads.
Checking file >>V00001.bam<< for signs of file truncation... Captured error message: '[E::hts_open_format] Failed to open file V00001.bam'
[ERROR] The file appears to be truncated, please ensure that there were no errors while copying the file!!! Exiting...
I would appreciate if you could tell me what parameter I am missing
The error here is this:
Specifying
--basename
in conjuction with--multicore
is currently not supported (but we are aiming to fix this soon). Please lose either--basename
or--multicore
to proceed
So the Bismark could simply be:
bismark --parallel 8 --genome /DataDrive/juaguila/Bvos-trimmed/BM-lot/ -1 ${base}_R1_val_1.fq.gz -2 ${base}_R2_val_2.fq.gz
This is fine:
deduplicate_bismark ${base}.bam
The following command will need --bedGraph
I believe:
bismark_methylation_extractor --gzip --bedGraph --genome_folder /DataDrive/juaguila/Bvos-trimmed/BM-lot/ --cytosine_report --multicore 8 ${base}.deduplicated.bam
Thanks, I ran the code and I made one change, in addition to eliminating the base (-B) parameter, I changed multicore to parallel. Could you explain what is the difference? I thought they are the same, but maybe I am wrong.
Here is the code
for i in *_R1_val_1.fq.gz
do
base=$(basename $i "_R1_val_1.fq.gz")
bismark -q --parallel 8 \
--genome /home/juaguila/Bismark/ \
-1 ${base}_R1_val_1.fq.gz -2 ${base}_R2_val_2.fq.gz
deduplicate_bismark --bam ${base}.bam
bismark_methylation_extractor -p --gzip --genome_folder /DataDrive/juaguila/Bvos-trimmed/BM-lot/ \
--cytosine_report --multicore 8 ${base}.deduplicated.bam
done
I have a question too. I thought that the basename (${base}) would work well, and I would have a file called whatever is before *_R1_val_1.fq.gz like V00001.bam, as it did when I ran bs-seeker and cgmaptools. Contrarily, iInstead it created files with the entire XXXX_R1_val_1.fq.gz suffix
[juaguila@v001 BSMK-1]$ cd ../BSMK-0/ [juaguila@v001 BSMK-0]$ ls -ltr total 103335076 -rw-r--r-- 1 juaguila hpc_jfierst 3999335475 Jun 8 2023 V00001_R2_val_2.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 3675649845 Jun 8 2023 V00001_R1_val_1.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 3436679091 Jun 8 2023 V00021_R2_val_2.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 3104862452 Jun 8 2023 V00021_R1_val_1.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 3079178155 Jun 8 2023 V00294_R2_val_2.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 2823807574 Jun 8 2023 V00294_R1_val_1.fq.gz -rw-r--r-- 1 juaguila hpc_jfierst 457 Mar 3 21:49 BSMK-test.sh -rw-r--r-- 1 juaguila hpc_jfierst 7654589585 Mar 4 06:55 V00001_R1_val_1_bismark_bt2_pe.bam -rw-r--r-- 1 juaguila hpc_jfierst 1867 Mar 4 06:55 V00001_R1_val_1_bismark_bt2_PE_report.txt -rw-r--r-- 1 juaguila hpc_jfierst 6195867738 Mar 4 11:14 V00021_R1_val_1_bismark_bt2_pe.bam -rw-r--r-- 1 juaguila hpc_jfierst 1868 Mar 4 11:14 V00021_R1_val_1_bismark_bt2_PE_report.txt
Did I write something wrong? I thought that the code was specifying just the XXX part; that is why I used the -B base parameter to specify the file name, and then create the loop/sequence for the other codes. I thought that the alignment would write a bam file called XXX.bam and not a XXX_R1_val_1.fq.gz file.
The options --parallel/--multicore
are identical.
I believe for the output files, only the ending is stripped off (e.g. .fq.gz
). You can always rename files afterwards so they are to your liking, e.g.
rename s/_R1_val_1_bismark_/_/ *
The file XXX_R1_val_1.fq.gz
is the input file (and is not created by Bismark...).
Good evening,
I followed your code for the methylation extractor, and I didn't get the bedgraph ouput nor the genome-wide methylation report files, which I require to do the detection of differentially methylated regions.
The code was:
for i in *_R1_val_1.fq.gz
do
base=$(basename $i "_R1_val_1.fq.gz")
bismark -q --parallel 10 \
--genome /home/juaguila/Bismark/ \
-1 ${base}_R1_val_1.fq.gz -2 ${base}_R2_val_2.fq.gz
deduplicate_bismark -p --bam ${base}_R1_val_1_bismark_bt2_pe.bam -o ${base}
bismark_methylation_extractor -p --gzip --bedGraph --buffer_size 20G --genome_folder /home/juaguila/Bismark/ \
--cytosine_report --parallel 10 ${base}.deduplicated.bam
done
The log tells this error:
Using the following files as Input: /home/juaguila/BombusMethylSeq/Rec-5/BSMK-2/CpG_OT_V00747.deduplicated.txt.gz /home/juaguila/BombusMethylSeq/Rec-5/BSMK-2/CpG_OB_V00747.deduplicated.txt.gz
Writing bedGraph to file: V00747.deduplicated.bedGraph.gz Also writing out a coverage file including counts methylated and unmethylated residues to file: V00747.deduplicated.bismark.cov.gz
Now writing methylation information for file >>CpG_OT_V00747.deduplicated.txt.gz<< to individual files for each chromosome
Failed to open filehandle: Too many open files at /home/juaguila/.conda/envs/bismark/bin/bismark2bedGraph line 279,
And for the cytosine report.
Adding context-specific methylation summaries
Writing genome-wide cytosine report to: V00314.deduplicated.CpG_report.txt.gz <<<
Writing all cytosine context summary file to: V00314.deduplicated.cytosine_context_summary.txt <<<
No last chromosome was defined, something must have gone wrong while reading the data in (e.g. specified wrong file path for a gzipped coverage file?). Please check your command!
I also wanted to inquire about this error. While I looked at the log file, I saw this very often: Processed 3000000 sequence pairs so far Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2313:3495:29340_1:N:0:GTCGTTAC+GCTTAGCT NW_022883904.1 2 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2265:16071:7435_1:N:0:GTCGTTAC+GCTTAGCT NW_022883908.1 29954 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2214:25635:36370_1:N:0:GTCGTTAC+GCTTAGCT NW_022883462.1 19111 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2233:28782:9596_1:N:0:GTCGTTAC+GCTTAGCT NW_022884241.1 19018 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2238:29441:32095_1:N:0:GTCGTTAC+GCTTAGCT NW_022883649.1 13623 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2305:16034:34053_1:N:0:GTCGTTAC+GCTTAGCT NW_022883202.1 20994 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2234:16080:20760_1:N:0:GTCGTTAC+GCTTAGCT NW_022883657.1 29689 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2234:16098:20823_1:N:0:GTCGTTAC+GCTTAGCT NW_022883657.1 29689 Chromosomal sequence could not be extracted for A00744:203:HVYMWDSXY:3:2307:14525:10019_1:N:0:GTCGTTAC+GCTTAGCT NW_022883927.1 29210
Is this something that I have to worry about? I indexed the reference genome the standard way, my methylation reads are PE-150.
Could you please tell what is wrong with my code that it did not generate the output files?
Lastly, while reading the log I found this line about directional library in the cytosine report:
Finished generating genome-wide cytosine report
Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.5.3]) Output format is BAM (default) Alignments will be written out in BAM format. Samtools found here: '/home/juaguila/.conda/envs/bismark/bin/samtools' Reference genome folder provided is /home/juaguila/Bismark/ (absolute path is '/home/juaguila/Bismark/)' FastQ format specified
The library is PE-150. At first I was confused because it was done RNA-seq, which the library itself was directional, but then I realized that in this case, it is not. Is there an argument to state the library as non-directional, or am I misunderstanding the log?
Thank you.
You need to use the option --scaffolds
form the methylation extractor as you seem to have a genom with too many contigsscaffolds.
Can I rerun the methylation extractor code or should I eliminate any of the previous files (X.methXtractor.temp), or better move the deduplicated files to a new directory?
Thanks
I would probably remove the .temp files, but it might work even leaving them there...
I ran the methylator and it worked well now. By curiosity, I checked the MBias files, which I expected to be "good" or "fine" as the ones from methyldackel and I got surprised that they weren't as flat as those from methyldackel. The libraries were made with the enzymatic-methyl-kit from NEB. The R1 look okay, but there is a huge drop at the end, while the R2 have the drop in methylation level, until, similarly, reaching the end of the read, and the huge drop out.
Do you think that I should add any extra parameters to the methylation extractor? Do you recommend trimming bases from the 3'-end? R1 seems to need more trimming than the R2.
Thanks and sorry for the many questions.
A couple of comments:
bismark2report
once the run is complete, as this give you an interactive HTML version of all reports, including the M-bias plots
Hi,
I am studying differential methylation in bumblebees, and after using methylKit to find differentially methylated sites it seems better to find differentially methylated regions. I wanted to find DMR using the software swDMR (unless you suggest a different/better one) and it needs as input the cytosine report with strand and counts.
I am contacting you because for methylKit I have the bam files for methylation calling but I did pre-process them in a different way, not with Bismark. I trimmed the reads based on the suggested EM-seq parameters, and I checked for methylation-bias with Methyldackel with regular trimming parameters. After confirming best trimming parameters, I aligned the trimmed reads with bwa-meth, sam-to-bam conversion, and deduplication/elimination with Picard.
I wanted to ask if I can use the bam files that I have already generated to make the cytosine report or if I should re-processed my files all over again with Bismark.
Thank you again.