databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

Unable to create html reports using looper report when using newest looper (>1.5.0) #256

Closed donaldcampbelljr closed 10 months ago

donaldcampbelljr commented 1 year ago

The current tutorial claims that html reports can be created using looper report. However, with Looper 1.5.0 and greater, pipestat configuration is required to use that function. The tutorial does not mention this.

I have a new branch to update the documents. However, I noticed that, even after configuring looper to use pipestat, the generated html report is blank.

Next steps:

donaldcampbelljr commented 1 year ago

This appears to be related to a pipestat issue when using a JSON schema as the output schema: https://github.com/pepkit/pipestat/issues/119

donaldcampbelljr commented 1 year ago

Additional info:

PEPATAC results are reported to a stats.yaml via pypiper-> pipestat.

looper report then uses pipestat to read this stats.yaml as its results file during report generation.

It appears that complex objects such as files and images are reported such that path cannot be read when constructing the HTML file.

Examples:

-Pypiper reports library_complexity as so: > `Library complexity` QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png PEPATAC _RES_

-Library complexity after being retrieved from using pipestat.retrieve_one:
('Library complexity', 'QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png PEPATAC')

Where this is occurring:

In PEPATAC https://github.com/databio/pepatac/blob/9ee0b6c1251b1addae8265c12f16cbfeae76d489/pipelines/pepatac.py#L1482-L1486

and then in Pypiper https://github.com/databio/pypiper/blob/8aaede5d75f374fd475573dfd875e3d742c581ea/pypiper/manager.py#L1683-L1692

Potential Solution

donaldcampbelljr commented 1 year ago

Current output of pipeline in form of a stats.yaml file:

PEPATAC:
  project: {}
  sample:
    DEFAULT_SAMPLE_NAME:
      File_mb: 27
      pipestat_created_time: '2023-11-17 15:24:41'
      pipestat_modified_time: '2023-11-17 15:29:14'
      Read_type: paired
      Genome: hg38
      Raw_reads: '1000000'
      Fastq_reads: 1000000
      Trimmed_reads: 1000000
      Trim_loss_rate: 0.0
      FastQC report r1: fastq/tutorial1_R1_trim_fastqc.html FastQC report r1 None
        PEPATAC
      FastQC report r2: fastq/tutorial1_R2_trim_fastqc.html FastQC report r2 None
        PEPATAC
      Aligned_reads_rCRSd: 99360.0
      Alignment_rate_rCRSd: 9.94
      Mapped_reads: '900577'
      QC_filtered_reads: 3835
      Aligned_reads: '896742'
      Alignment_rate: 89.67
      Total_efficiency: 89.67
      Mitochondrial_reads: 18
      NRF: 1.0
      PBC1: 1.0
      PBC2: 448366.0
      Unmapped_reads: 63
      Duplicate_reads: '0'
      Dedup_aligned_reads: 896742.0
      Dedup_alignment_rate: 89.67
      Dedup_total_efficiency: 89.67
      NFR_frac: 0.3593
      mono_frac: 0.2362
      di_frac: 0.0647
      tri_frac: 0.0014
      poly_frac: 0.0013
      Read_length: 42
      Genome_size: 3099922541
      Library complexity: QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png
        PEPATAC
      Frac_exp_unique_at_10M: 0.9585
      TSS_score: 14.2
      TSS enrichment: QC_hg38/tutorial1_TSS_enrichment.pdf TSS enrichment QC_hg38/tutorial1_TSS_enrichment.png
        PEPATAC
      Fragment distribution: QC_hg38/tutorial1_fragLenDistribution.pdf Fragment distribution
        QC_hg38/tutorial1_fragLenDistribution.png PEPATAC
      Peak_count: 549875
      FRiP: 0.93
      Peak chromosome distribution: QC_hg38/tutorial1_peak_chromosome_distribution.pdf
        Peak chromosome distribution QC_hg38/tutorial1_peak_chromosome_distribution.png
        PEPATAC
      TSS distance distribution: QC_hg38/tutorial1_peak_TSS_distribution.pdf TSS distance
        distribution QC_hg38/tutorial1_peak_TSS_distribution.png PEPATAC
      Peak partition distribution: QC_hg38/tutorial1_peak_genomic_distribution.pdf
        Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.png
        PEPATAC
      cFRiF: QC_hg38/tutorial1_cFRiF.pdf cFRiF QC_hg38/tutorial1_cFRiF.png PEPATAC
      FRiF: QC_hg38/tutorial1_FRiF.pdf FRiF QC_hg38/tutorial1_FRiF.png PEPATAC
      Time: 0:04:33
      Success: 11-17-15:29:14

vs previous method which output a stats.tsv and an objects.tsv:

FastQC report r1    fastq/tutorial1_R1_trim_fastqc.html FastQC report r1    None    PEPATAC
FastQC report r2    fastq/tutorial1_R2_trim_fastqc.html FastQC report r2    None    PEPATAC
Library complexity  QC_hg38/tutorial1_preseq_plot.pdf   Library complexity  QC_hg38/tutorial1_preseq_plot.png   PEPATAC
TSS enrichment  QC_hg38/tutorial1_TSS_enrichment.pdf    TSS enrichment  QC_hg38/tutorial1_TSS_enrichment.png    PEPATAC
Fragment distribution   QC_hg38/tutorial1_fragLenDistribution.pdf   Fragment distribution   QC_hg38/tutorial1_fragLenDistribution.png   PEPATAC
Peak chromosome distribution    QC_hg38/tutorial1_peak_chromosome_distribution.pdf  Peak chromosome distribution    QC_hg38/tutorial1_peak_chromosome_distribution.png  PEPATAC
TSS distance distribution   QC_hg38/tutorial1_peak_TSS_distribution.pdf TSS distance distribution   QC_hg38/tutorial1_peak_TSS_distribution.png PEPATAC
Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.pdf Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.png PEPATAC
cFRiF   QC_hg38/tutorial1_cFRiF.pdf cFRiF   QC_hg38/tutorial1_cFRiF.png PEPATAC
FRiF    QC_hg38/tutorial1_FRiF.pdf  FRiF    QC_hg38/tutorial1_FRiF.png  PEPATAC

These changes occurred in pypiper v0.13.0 when the reporting backend was transitioned to pipestat.

donaldcampbelljr commented 1 year ago

Related to this issue, unable to resolve pipestat's results.yaml in the looper config file without inputting it as an absolute path after it has been created by pypiper.

This is because pypiper will create a stats.yaml results file if pipestat_results_file is not given to pypiper as an input parameter.

Current workaround is to add the actual path after the pipeline has run, simply so that it can be used for looper report and looper link functionality.

name: PEPATAC_tutorial
pep_config: tutorial_refgenie_project_config.yaml

output_dir: "${TUTORIAL}/processed/"
pipeline_interfaces:
  sample: ["${TUTORIAL}/tools/pepatac/sample_pipeline_interface.yaml"]
  project: ["${TUTORIAL}/tools/pepatac/project_pipeline_interface.yaml"]

pipestat:
  results_file_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/stats.yaml
  #results_file_path: "${TUTORIAL}/processed/results_pipeline/{sample_name}/stats.yaml" # This does not work
donaldcampbelljr commented 1 year ago

Changed pypipers report_object to create a values dict that conforms with a pipestat output_schema for complex objects: https://github.com/databio/pypiper/commit/1a677dad34ffe77dd14cff1034d02e9fde09c117

An example of PEPATAC reported results after the change:

TSS enrichment:
  path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.pdf
  thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.png
  title: TSS enrichment
  annotation: PEPATAC

vs before:

TSS enrichment: QC_hg38/tutorial1_TSS_enrichment.pdf TSS enrichment QC_hg38/tutorial1_TSS_enrichment.png
        PEPATAC

The results file output should now match the output_schema and allow for proper report generation. However, the object pages are still showing up blank: image

donaldcampbelljr commented 1 year ago

I found the issue; the key value being reported did not match the output schema for the complex types,e.g.: Pepatac reports "Library complexity" but the output_schema's key is library_complexity.

I manually edited the stats.yaml file and confirmed that this does solve the issue: image

donaldcampbelljr commented 1 year ago

There are also many items reported by the pipeline that are not in the output schema and, therefore, these results are not captured in the html report.

donaldcampbelljr commented 1 year ago

This will work in PR #255:

However, it is not 100% because:

Example Results Yaml:

PEPATAC:
  project: {}
  sample:
    tutorial1:
      File_mb: 27
      pipestat_created_time: '2023-11-27 13:54:39'
      pipestat_modified_time: '2023-11-27 13:59:04'
      Read_type: paired
      Genome: hg38
      Raw_reads: '1000000'
      Fastq_reads: 1000000
      Trimmed_reads: 1000000
      Trim_loss_rate: 0.0
      FastQC report r1:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R1_trim_fastqc.html
        thumbnail_path: null
        title: FastQC report r1
        annotation: PEPATAC
      FastQC report r2:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R2_trim_fastqc.html
        thumbnail_path: null
        title: FastQC report r2
        annotation: PEPATAC
      Aligned_reads_rCRSd: 99360.0
      Alignment_rate_rCRSd: 9.94
      Mapped_reads: '900577'
      QC_filtered_reads: 3835
      Aligned_reads: '896742'
      Alignment_rate: 89.67
      Total_efficiency: 89.67
      Mitochondrial_reads: 18
      NRF: 1.0
      PBC1: 1.0
      PBC2: 448366.0
      Unmapped_reads: 63
      Duplicate_reads: '0'
      Dedup_aligned_reads: 896742.0
      Dedup_alignment_rate: 89.67
      Dedup_total_efficiency: 89.67
      NFR_frac: 0.3593
      mono_frac: 0.2362
      di_frac: 0.0647
      tri_frac: 0.0014
      poly_frac: 0.0013
      Read_length: 42
      Genome_size: 3099922541
      Library complexity:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_preseq_plot.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_preseq_plot.png
        title: Library complexity
        annotation: PEPATAC
      Frac_exp_unique_at_10M: 0.9585
      TSS_score: 14.2
      TSS enrichment:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_TSS_enrichment.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_TSS_enrichment.png
        title: TSS enrichment
        annotation: PEPATAC
      Fragment distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_fragLenDistribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_fragLenDistribution.png
        title: Fragment distribution
        annotation: PEPATAC
      Peak_count: 549875
      FRiP: 0.93
      Peak chromosome distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_chromosome_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_chromosome_distribution.png
        title: Peak chromosome distribution
        annotation: PEPATAC
      TSS distance distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_TSS_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_TSS_distribution.png
        title: TSS distance distribution
        annotation: PEPATAC
      Peak partition distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_genomic_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_genomic_distribution.png
        title: Peak partition distribution
        annotation: PEPATAC
      cFRiF:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_cFRiF.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_cFRiF.png
        title: cFRiF
        annotation: PEPATAC
      FRiF:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_FRiF.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_FRiF.png
        title: FRiF
        annotation: PEPATAC
      Time: 0:04:25
      Success: 11-27-13:59:04
    tutorial2:
      File_mb: 27
      pipestat_created_time: '2023-11-27 13:59:05'
      pipestat_modified_time: '2023-11-27 14:03:28'
      Read_type: paired
      Genome: hg38
      Raw_reads: '1000000'
      Fastq_reads: 1000000
      Trimmed_reads: 1000000
      Trim_loss_rate: 0.0
      FastQC report r1:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R1_trim_fastqc.html
        thumbnail_path: null
        title: FastQC report r1
        annotation: PEPATAC
      FastQC report r2:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R2_trim_fastqc.html
        thumbnail_path: null
        title: FastQC report r2
        annotation: PEPATAC
      Aligned_reads_rCRSd: 100556.0
      Alignment_rate_rCRSd: 10.06
      Mapped_reads: '899373'
      QC_filtered_reads: 4021
      Aligned_reads: '895352'
      Alignment_rate: 89.54
      Total_efficiency: 89.54
      Mitochondrial_reads: 30
      NRF: 1.0
      PBC1: 1.0
      PBC2: 447669.0
      Unmapped_reads: 71
      Duplicate_reads: '0'
      Dedup_aligned_reads: 895352.0
      Dedup_alignment_rate: 89.54
      Dedup_total_efficiency: 89.54
      NFR_frac: 0.3602
      mono_frac: 0.2354
      di_frac: 0.0643
      tri_frac: 0.0015
      poly_frac: 0.0014
      Read_length: 42
      Genome_size: 3099922541
      TSS_score: 12.8
      TSS enrichment:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.png
        title: TSS enrichment
        annotation: PEPATAC
      Fragment distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_fragLenDistribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_fragLenDistribution.png
        title: Fragment distribution
        annotation: PEPATAC
      Peak_count: 548852
      FRiP: 0.93
      Peak chromosome distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_chromosome_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_chromosome_distribution.png
        title: Peak chromosome distribution
        annotation: PEPATAC
      TSS distance distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_TSS_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_TSS_distribution.png
        title: TSS distance distribution
        annotation: PEPATAC
      Peak partition distribution:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_genomic_distribution.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_genomic_distribution.png
        title: Peak partition distribution
        annotation: PEPATAC
      cFRiF:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_cFRiF.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_cFRiF.png
        title: cFRiF
        annotation: PEPATAC
      FRiF:
        path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_FRiF.pdf
        thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_FRiF.png
        title: FRiF
        annotation: PEPATAC
      Time: 0:04:23
      Success: 11-27-14:03:28

PEPATAC Output Schema

title: An example Pipestat output schema
description: objects produced by PEPATAC pipeline.
type: object
properties:
  pipeline_name: PEPATAC
  samples:
    type: object
    properties:
      smooth_bw:
        type: string
        description: "Smoothed signal track"
      exact_bw:
        type: string
        description: "Nucleotide-resolution signal track"
      aligned_bam:
        type: string
        description: "Coordinate sorted deduplicated, aligned BAM file"
      peak_file:
        type: string
        description: "Sample peak file"
      coverage_file:
        type: string
        description: "Sample peak coverage table"
      summits_bed:
        type: string
        description: "Peak summit file"
  project:
    type: object
    properties:
      alignment_percent_file:
        title: "Alignment percent file"
        description: "Plots percent of total alignment to all pre-alignments and primary genome."
        type: object
        object_type: image
        properties:
          path:
            type: string
          thumbnail_path:
            type: string
          title:
            type: string
        required:
          - path
          - thumbnail_path
          - title
      alignment_raw_file:
        title: "Alignment raw file"
        description: "Plots raw alignment rates to all pre-alignments and primary genome."
        type: object
        object_type: image
        properties:
          path:
            type: string
          thumbnail_path:
            type: string
          title:
            type: string
        required:
          - path
          - thumbnail_path
          - title
      tss_file:
        title: "TSS enrichment file"
        description: "Plots TSS scores for each sample."
        type: object
        object_type: image
        properties:
          path:
            type: string
          thumbnail_path:
            type: string
          title:
            type: string
        required:
          - path
          - thumbnail_path
          - title
      library_complexity_file:
        title: "Library complexity file"
        description: "Plots each sample's library complexity on a single plot."
        type: object
        object_type: image
        properties:
          path:
            type: string
          thumbnail_path:
            type: string
          title:
            type: string
        required:
          - path
          - thumbnail_path
          - title
      consensus_peaks_file:
        title: "consesus peak file"
        description: "A set of consensus peaks across samples."
        type: object
        object_type: file
        properties:
          path:
            type: string
          title:
            type: string
          thumbnail_path:
            type: string
        required:
          - path
          - title
      counts_table:
        title: "Project peak coverage file"
        description: "Project peak coverages: chr_start_end X sample"
        type: object
        object_type: file
        properties:
          path:
            type: string
          title:
            type: string
          thumbnail_path:
            type: string
        required:
          - path
          - title
donaldcampbelljr commented 1 year ago

Ok, I've added the reported outputs to the output schema in dev_test_pipestat and it now works much better when building the report.