Hello, Thank you for the great package, I've enjoyed using it. Here is my issue: I have successfully run the sample level pipeline to generate peak coverage files for each sample and then re-ran it with the reference peak bed file set as the consensus peak output. This generated _ref_peaks_coverage.bed files within the peak_calling folder for each sample in the results pipeline. However, now when I run the project level pipeline to produce the final counts table for the consensus peaks, the pipeline ignores the ref_peaks_coverage.bed files and instead generates the counts table from all sample peaks combined, rather than the consensus peaks. It gives the following warning. I understand this warning is from the PEPATACr.R script when the _ref_peaks_coverage.bed files are not found, but I do not understand how this is happening, because those files are definitely there.

Any insight you could give on how to address this would be much appreciated. Thank you!

Below I am copying both the log file and my yaml file:

log file

Pipeline run code and environment:

Command: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/pipelines/pepatac_collator.py --config tcf7_refgenie_new.yaml -O /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/ -P 1 -M 16000 -n PEPATAC_tcf7 -r /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline
Compute host: cosmo-02.wexac.weizmann.ac.il
Working dir: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac
Outfolder: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/
Pipeline started at: (03-05 19:38:17) elapsed: 0.0 TIME

Version log:

Python version: 3.9.7
Pypiper dir: /home/labs/amit/kathleen/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper
Pypiper version: 0.12.3
Pipeline dir: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/pipelines
Pipeline version: 0.0.4
Pipeline hash: 0d95e4564d243f70d3be8afca32082be88833a01
Pipeline branch: * master
Pipeline date: 2022-02-04 14:49:33 -0500
Pipeline diff: 359 files changed, 2 insertions(+), 2 deletions(-)

Arguments passed to pipeline:

config_file: tcf7_refgenie_new.yaml
cores: 1
cutoff: 2
dirty: False
force_follow: False
logdev: False
mem: 16000
min_olap: 1
min_score: 5
name: PEPATAC_tcf7
new_start: False
normalized: False
output_parent: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/
poverlap: False
recover: False
results: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline
silent: False
skip_consensus: False
skip_table: False
testmode: False
verbosity: None

Target to produce: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_libComplexity.pdf,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_*_consensusPeaks.narrowPeak,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_peaks_coverage.tsv

Rscript /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/tools/PEPATAC_summarizer.R tcf7_refgenie_new.yaml /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/ /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline 2 5 1 (45017)
Loading config file: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tcf7_refgenie_new.yaml
Creating stats summary...
Summary (n=11): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//PEPATAC_tcf7_stats_summary.tsv
Creating assets summary...
Summary (n=11): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//PEPATAC_tcf7_assets_summary.tsv
Creating summary plots...
Project level library complexity plot already exists.
Project library complexity plot: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_libComplexity.pdf

Successfully produced project summary plots.

Consensus peak set (mm10): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_consensusPeaks.narrowPeak

Calculating mm10 peak counts for 10 samples... Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv

Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv

Warning message: In PEPATACr::peakCounts(sample_table, summary_dir, argv$results, : Peak coverage files are not derived from a singular reference peak set. Command completed. Elapsed time: 0:00:28. Running peak memory: 1.089GB.
PID: 45017; Command: Rscript; Return code: 0; Memory used: 1.089GB

Pipeline completed. Epilogue

Elapsed time (this run): 0:00:28
Total elapsed time (all runs): 0:00:28
Peak memory (this run): 1.0891 GB
Pipeline completed time: 2022-03-05 19:38:45

yaml file

name: PEPATAC_tcf7

pep_version: 2.0.0 sample_table: tcf7.csv

looper: output_dir: "${ANALYSIS}/processed/" pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/project_pipeline_interface.yaml"]

sample_modifiers: append: pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/sample_pipeline_interface.yaml"] derive: attributes: [read1, read2] sources:

then set

  # path to your local saved files
  R1: "${ANALYSIS}/fastq/{sample_name}_R1_001.fastq.gz"
  R2: "${ANALYSIS}/fastq/{sample_name}_R2_001.fastq.gz"

imply:

if: organism: ["mouse"] then: genome: mm10 prealignment_names: ["rCRSd"] deduplicator: samblaster # Default. [options: picard] trimmer: skewer # Default. [options: pyadapt, trimmomatic] peak_type: fixed # Default. [options: variable] extend: "250" # Default. For fixed-width peaks, extend this distance up- and down-stream. frip_ref_peaks: "${ANALYSIS}/processed/summary/PEPATAC_tcf7_mm10_consensusPeaks.bed" # KA changed from None

databio / pepatac

Unable to generate counts table for consensus peaks (pipeline ignores _ref_peaks_coverage.bed files) #218

Pipeline run code and environment:

Version log:

Arguments passed to pipeline:

Pipeline completed. Epilogue

Obtain tutorial data from http://big.databio.org/pepatac/ then set