MarWoes / wg-blimp

wg-blimp: an end-to-end analysis pipeline for whole genome bisulfite sequencing data
GNU Affero General Public License v3.0
26 stars 12 forks source link

Error in rule prep_gemBS_files: #25

Open lexie-lee opened 1 year ago

lexie-lee commented 1 year ago

hello! thanks for building this tools! I met some problems when I try to deal with my WGBS data.

I tried to run wg-blimp from config file, but I went some error, it cannot be run successfully.

this is the log file:

`Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /bin/bash
Provided cores: 32
Rules claiming more threads will be scaled down.
Job stats:
job                              count    min threads    max threads
-----------------------------  -------  -------------  -------------
all                                  1              1              1
bedgraph_to_methylation_ratio       12              1              1
benchmark_plot                       1              1              1
bsseq                                1              8              8
clean_gemBS_csv                      1              1              1
dmr_annotation                       1              1              1
dmr_combination                      1              1              1
dmr_coverage                        12              8              8
fastqc                              12              1              1
gemBS                                1              1              1
gemBS_csv                           12              1              1
index_bam                           12              1              1
mark_duplicates                     12              1              1
mbias                               12              1              1
methyl_dackel                       12              1              1
methylation_metrics                  1              1              1
methylseekr                          1              8              8
metilene                             1              1              1
metilene_input                       1              1              1
multiqc                              1              1              1
picard_metrics                      12              1              1
prep_fai                             1              1              1
prep_gemBS_files                     1              1              1
qualimap                            12              8              8
total                              134              1              8

Select jobs to execute...

[Fri Apr  7 10:15:41 2023]
rule prep_gemBS_files:
    output: /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv, /Volumes/PBLAB2/WGBS/results/alignment/gemBS.conf
    jobid: 119
    reason: Missing output files: /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv, /Volumes/PBLAB2/WGBS/results/alignment/gemBS.conf
    priority: 9
    resources: tmpdir=/var/folders/2v/v69c5xt93y377wv2pll3j24h0000gq/T

            touch /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv
            sed -i '1i"Barcode","Dataset","File1", "File2"' /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv
            cat << EOF > /Volumes/PBLAB2/WGBS/results/alignment/gemBS.conf
    reference = /Users/xiaoyu/igv/genomes/seq/mm10.fa
    index_dir = /Volumes/PBLAB2/WGBS/Clean_Data
    base = $HOME
    sequence_dir = /Volumes/PBLAB2/WGBS/Clean_Data
    bam_dir = /Volumes/PBLAB2/WGBS/results/alignment
    bcf_dir = /Volumes/PBLAB2/WGBS/results/alignment
    extract_dir = /Volumes/PBLAB2/WGBS/results/alignment
    report_dir = /Volumes/PBLAB2/WGBS/results/logs
    threads = 8
    jobs = 4
    include IHEC_standard.conf
    EOF

[Fri Apr  7 10:15:41 2023]
Error in rule prep_gemBS_files:
    jobid: 119
    output: /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv, /Volumes/PBLAB2/WGBS/results/alignment/gemBS.conf
    shell:

            touch /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv
            sed -i '1i"Barcode","Dataset","File1", "File2"' /Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv
            cat << EOF > /Volumes/PBLAB2/WGBS/results/alignment/gemBS.conf
    reference = /Users/xiaoyu/igv/genomes/seq/mm10.fa
    index_dir = /Volumes/PBLAB2/WGBS/Clean_Data
    base = $HOME
    sequence_dir = /Volumes/PBLAB2/WGBS/Clean_Data
    bam_dir = /Volumes/PBLAB2/WGBS/results/alignment
    bcf_dir = /Volumes/PBLAB2/WGBS/results/alignment
    extract_dir = /Volumes/PBLAB2/WGBS/results/alignment
    report_dir = /Volumes/PBLAB2/WGBS/results/logs
    threads = 8
    jobs = 4
    include IHEC_standard.conf
    EOF

        (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job prep_gemBS_files since they might be corrupted:
/Volumes/PBLAB2/WGBS/results/alignment/gemBS.csv
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2023-04-07T101535.945747.snakemake.log`

this is my yaml file:

high ctrl.txt

Could you please help me on this? Thank you very much for your time.

JakeLehle commented 1 year ago

Hello @lexie-lee,

I apologize for taking a bit to respond to your issue I was in the midst of defending my thesis and wasn't checking the issue page regularly. I'm free now and can help you trouble shoot this.

Okay, the erroring step you showed is part of the new additions to the pipeline that incorporates gemBS as an alternative aligner to bwa-meth. But the step you showed is just one that is involved with setting up the config before the alignment starts this indicates that there are some issues with the pipeline finding your fastq files. Could you send me a screenshot of your /Volumes/PBLAB2/WGBS/Clean_Data dir I wanna make sure your file names will match the regular expression file gobbling in the config file. Also, remove the path indicated on the config to this sample csv file since you indicate all the sample information previously in the config. That is redundant and might be confusing the pipeline.

Here is the line I'm talking about. sample_fastq_csv: /Volumes/PBLAB2/WGBS/WGBSsample_group_high_ctrl.csv remove "/Volumes/PBLAB2/WGBS/WGBSsample_group_high_ctrl.csv"

Best, Jake