databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
52 stars 13 forks source link

Unable to generate count table with reference peaks #219

Open gbloeb opened 2 years ago

gbloeb commented 2 years ago

Ran a project with reference peaks, but I am unable to generate a count table for the reference peaks: When the project is run: 1) peaks are still called for each sample individually 2) Each sample/peak_calling_mm10 directory contains both: GLIS3_Ctrl_1_S2_ref_peaks_coverage.bed corresponding to the reference peaks and GLIS3_Ctrl_1_S2_peaks_coverage.bed.gz corresponding to the called peaks

When I run the project processing pipeline, consensus peaks are still generated and the count table is generated with the consensus peaks with the warning: Warning message: In PEPATACr::peakCounts(sample_table, summary_dir, argv$results, : Peak coverage files are not derived from a singular reference peak set.

My config:

# This project config file describes your project. See looper docs for details.
name: GLIS3_ATAC_nolambda_qe-7_sh-30_peaks # The name that summary files will be prefaced with

pep_version: 2.0.0
sample_table: annotation_onlyGLIS3.csv  # sheet listing all samples in the project

looper:  # relative paths are relative to this config file
  output_dir: ~/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results
  pipeline_interfaces: ~/pepatac/project_pipeline_interface.yaml  # PATH to the directory where looper will find the pipeline repository.

sample_modifiers:
  append:
    pipeline_interfaces: ~/pepatac/sample_pipeline_interface.yaml
  derive:
    attributes: [read1, read2]
    sources:
      R1: "~/group/bulk_atac/220126_GLIS_ATAC_IMCD3/fastq/{sample_name}_R1_001.fastq.gz"
      R2: "~/group/bulk_atac/220126_GLIS_ATAC_IMCD3/fastq/{sample_name}_R2_001.fastq.gz"
  imply:
    - if:
    organism: ["mouse"]
      then:
    genome: mm10
        prealignment_names: ["mouse_chrM2x"]
        genome_size: "2.3e9"
        frip_ref_peaks: ~/group/bulk_atac/220126_GLIS_ATAC_IMCD3/comb_peak_call/GLIS_doxpctrl_shift.bed_nolambda_q0.0000001_sh-30_peaks.narrowPeak

a sample log file: PEPATAC_log.md

project log file: `### Pipeline run code and environment:

Version log:

Arguments passed to pipeline:


Target to produce: /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_libComplexity.pdf,/wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_*_consensusPeaks.narrowPeak,/wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_peaks_coverage.tsv

Rscript /wynton/protected/home/reiter/gloeb/pepatac/tools/PEPATAC_summarizer.R /wynton/protected/home/reiter/gloeb/pepatac/220126_GLIS_ATAC_IMCD3/config_GLIS_doxpctrl_shift.bed_nolambda_q0.0000001_sh-30_peaks.yaml /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/results_pipeline 2 5 1 (9429)

Loading config file: /wynton/protected/home/reiter/gloeb/pepatac/220126_GLIS_ATAC_IMCD3/config_GLIS_doxpctrl_shift.bed_nolambda_q0.0000001_sh-30_peaks.yaml
Creating stats summary...
Summary (n=4): /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_stats_summary.tsv
Creating assets summary...
Summary (n=4): /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_assets_summary.tsv
Creating summary plots...
4 of 4 library complexity files available.
INFO: Found real counts for GLIS3_Ctrl_1_S2 - Total (M): 126.745044 Unique (M): 108.93747
INFO: Found real counts for GLIS3_Ctrl_2_S7 - Total (M): 111.712142 Unique (M): 96.955292
INFO: Found real counts for GLIS3_Dox_1_S3 - Total (M): 221.935342 Unique (M): 183.353884
INFO: Found real counts for GLIS3_Dox_2_S8 - Total (M): 116.520356 Unique (M): 105.904826

WARNING: y-max value changed from default 139.24586665 to the max real data 201.6892724 Successfully produced project summary plots.

Calculating mm10 consensus peak set from 4 samples... Consensus peak set: /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_mm10_consensusPeaks.narrowPeak

Calculating mm10 peak counts for 4 samples... Counts table: /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_mm10_peaks_coverage.tsv

Counts table: /wynton/protected/home/reiter/gloeb/group/bulk_atac/220126_GLIS_ATAC_IMCD3/GLIS3_nolambda_qe-7_sh-30_peaks/pepatac_results/summary/GLIS3_ATAC_nolambda_qe-7_sh-30_peaks_mm10_peaks_coverage.tsv

Warning message: In PEPATACr::peakCounts(sample_table, summary_dir, argv$results, : Peak coverage files are not derived from a singular reference peak set. Command completed. Elapsed time: 0:00:57. Running peak memory: 0.91GB. PID: 9429; Command: Rscript; Return code: 0; Memory used: 0.91GB

Pipeline completed. Epilogue

`

Kange2014 commented 1 year ago

does anyone has an update on this issue? encounter the same problem. Thanks.

ljmills commented 1 year ago

I am also having this issue

zhongzheng1999 commented 5 months ago

The issue seems to persist. I am also having this issue. @donaldcampbelljr Could you do me a favor to solve the issue?Thanks!

zhongzheng1999 commented 5 months ago

@ljmills Did you ever find a solution? Thanks!