Hello,
Thank you for the great package, I've enjoyed using it. Here is my issue:
I have successfully run the sample level pipeline to generate peak coverage files for each sample and then re-ran it with the reference peak bed file set as the consensus peak output. This generated _ref_peaks_coverage.bed files within the peak_calling folder for each sample in the results pipeline. However, now when I run the project level pipeline to produce the final counts table for the consensus peaks, the pipeline ignores the ref_peaks_coverage.bed files and instead generates the counts table from all sample peaks combined, rather than the consensus peaks. It gives the following warning. I understand this warning is from the PEPATACr.R script when the _ref_peaks_coverage.bed files are not found, but I do not understand how this is happening, because those files are definitely there.
Any insight you could give on how to address this would be much appreciated. Thank you!
Below I am copying both the log file and my yaml file:
Target to produce: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_libComplexity.pdf,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_*_consensusPeaks.narrowPeak,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_peaks_coverage.tsv
Hello, Thank you for the great package, I've enjoyed using it. Here is my issue: I have successfully run the sample level pipeline to generate peak coverage files for each sample and then re-ran it with the reference peak bed file set as the consensus peak output. This generated _ref_peaks_coverage.bed files within the peak_calling folder for each sample in the results pipeline. However, now when I run the project level pipeline to produce the final counts table for the consensus peaks, the pipeline ignores the ref_peaks_coverage.bed files and instead generates the counts table from all sample peaks combined, rather than the consensus peaks. It gives the following warning. I understand this warning is from the PEPATACr.R script when the _ref_peaks_coverage.bed files are not found, but I do not understand how this is happening, because those files are definitely there.
Any insight you could give on how to address this would be much appreciated. Thank you!
Below I am copying both the log file and my yaml file:
log file
Pipeline run code and environment:
/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/pipelines/pepatac_collator.py --config tcf7_refgenie_new.yaml -O /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/ -P 1 -M 16000 -n PEPATAC_tcf7 -r /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline
Version log:
/home/labs/amit/kathleen/miniconda3/envs/pepatac/lib/python3.9/site-packages/pypiper
/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/tools/pepatac/pipelines
Arguments passed to pipeline:
config_file
:tcf7_refgenie_new.yaml
cores
:1
cutoff
:2
dirty
:False
force_follow
:False
logdev
:False
mem
:16000
min_olap
:1
min_score
:5
name
:PEPATAC_tcf7
new_start
:False
normalized
:False
output_parent
:/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/
poverlap
:False
recover
:False
results
:/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/results_pipeline
silent
:False
skip_consensus
:False
skip_table
:False
testmode
:False
verbosity
:None
Target to produce:
/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_libComplexity.pdf
,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_*_consensusPeaks.narrowPeak
,/home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed/summary/PEPATAC_tcf7_peaks_coverage.tsv
Successfully produced project summary plots.
Consensus peak set (mm10): /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_consensusPeaks.narrowPeak
Calculating mm10 peak counts for 10 samples... Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv
Counts table: /home/labs/amit/kathleen/Tcf7_ATAC_analysis/tcf7_pepatac/processed//summary/PEPATAC_tcf7_mm10_peaks_coverage.tsv
Warning message: In PEPATACr::peakCounts(sample_table, summary_dir, argv$results, : Peak coverage files are not derived from a singular reference peak set. Command completed. Elapsed time: 0:00:28. Running peak memory: 1.089GB.
PID: 45017; Command: Rscript; Return code: 0; Memory used: 1.089GB
Pipeline completed. Epilogue
yaml file
name: PEPATAC_tcf7
pep_version: 2.0.0 sample_table: tcf7.csv
looper: output_dir: "${ANALYSIS}/processed/" pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/project_pipeline_interface.yaml"]
sample_modifiers: append: pipeline_interfaces: ["${ANALYSIS}/tools/pepatac/sample_pipeline_interface.yaml"] derive: attributes: [read1, read2] sources:
Obtain tutorial data from http://big.databio.org/pepatac/ then set
imply: