databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

Unable to generate counts table for consensus peaks (pipeline ignores _ref_peaks_coverage.bed files) #218

Open kathleenabadie opened 2 years ago

kathleenabadie commented 2 years ago

Hello, Thank you for the great package, I've enjoyed using it. Here is my issue: I have successfully run the sample level pipeline to generate peak coverage files for each sample and then re-ran it with the reference peak bed file set as the consensus peak output. This generated _ref_peaks_coverage.bed files within the peak_calling folder for each sample in the results pipeline. However, now when I run the project level pipeline to produce the final counts table for the consensus peaks, the pipeline ignores the ref_peaks_coverage.bed files and instead generates the counts table from all sample peaks combined, rather than the consensus peaks. It gives the following warning. I understand this warning is from the PEPATACr.R script when the _ref_peaks_coverage.bed files are not found, but I do not understand how this is happening, because those files are definitely there.

Any insight you could give on how to address this would be much appreciated. Thank you!

Below I am copying both the log file and my yaml file:

log file

Pipeline run code and environment:

Version log:

Arguments passed to pipeline:


Target to produce: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_libComplexity.pdf,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_*_consensusPeaks.narrowPeak,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_peaks_coverage.tsv

Rscript /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/tools/PEPATAC_summarizer.R tcf7_refgenie_new.yaml /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/ /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline 2 5 1 (45017)

Loading config file: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tcf7_refgenie_new.yaml
Creating stats summary...
Summary (n=11): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//PEPATAC_tcf7_stats_summary.tsv
Creating assets summary...
Summary (n=11): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//PEPATAC_tcf7_assets_summary.tsv
Creating summary plots...
Project level library complexity plot already exists.
Project library complexity plot: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_libComplexity.pdf

Successfully produced project summary plots.

Consensus peak set (mm10): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_consensusPeaks.narrowPeak

Calculating mm10 peak counts for 10 samples... Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv

Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv

Warning message: In PEPATACr::peakCounts(sample_table, summary_dir, argv$results, : Peak coverage files are not derived from a singular reference peak set. Command completed. Elapsed time: 0:00:28. Running peak memory: 1.089GB.
PID: 45017; Command: Rscript; Return code: 0; Memory used: 1.089GB

Pipeline completed. Epilogue

yaml file

name: PEPATAC_tcf7

pep_version: 2.0.0 sample_table: tcf7.csv

looper: output_dir: "${ANALYSIS}/processed/" pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/project_pipeline_interface.yaml"]

sample_modifiers: append: pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/sample_pipeline_interface.yaml"] derive: attributes: [read1, read2] sources:

Obtain tutorial data from http://big.databio.org/pepatac/ then set

  # path to your local saved files
  R1: "${ANALYSIS}/fastq/{sample_name}_R1_001.fastq.gz"
  R2: "${ANALYSIS}/fastq/{sample_name}_R2_001.fastq.gz"

imply: