Closed donaldcampbelljr closed 10 months ago
This appears to be related to a pipestat issue when using a JSON schema as the output schema: https://github.com/pepkit/pipestat/issues/119
PEPATAC results are reported to a stats.yaml
via pypiper-> pipestat.
looper report
then uses pipestat
to read this stats.yaml
as its results file during report generation.
It appears that complex objects such as files and images are reported such that path
cannot be read when constructing the HTML file.
-Pypiper reports library_complexity
as so:
> `Library complexity` QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png PEPATAC _RES_
-Library complexity after being retrieved from using pipestat.retrieve_one
:
('Library complexity', 'QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png PEPATAC')
and then in Pypiper https://github.com/databio/pypiper/blob/8aaede5d75f374fd475573dfd875e3d742c581ea/pypiper/manager.py#L1683-L1692
looper report
which now uses pipestat summarize
, we must ensure the output schemas are aligned and that they are Pipestat compatible, i.e. JSON schema.report_result
instead of report_object
because it will allow for reporting of complex (nested) data types per the updated output schema (file, image):
https://github.com/databio/pypiper/blob/8aaede5d75f374fd475573dfd875e3d742c581ea/pypiper/manager.py#L1585-L1613Current output of pipeline in form of a stats.yaml
file:
PEPATAC:
project: {}
sample:
DEFAULT_SAMPLE_NAME:
File_mb: 27
pipestat_created_time: '2023-11-17 15:24:41'
pipestat_modified_time: '2023-11-17 15:29:14'
Read_type: paired
Genome: hg38
Raw_reads: '1000000'
Fastq_reads: 1000000
Trimmed_reads: 1000000
Trim_loss_rate: 0.0
FastQC report r1: fastq/tutorial1_R1_trim_fastqc.html FastQC report r1 None
PEPATAC
FastQC report r2: fastq/tutorial1_R2_trim_fastqc.html FastQC report r2 None
PEPATAC
Aligned_reads_rCRSd: 99360.0
Alignment_rate_rCRSd: 9.94
Mapped_reads: '900577'
QC_filtered_reads: 3835
Aligned_reads: '896742'
Alignment_rate: 89.67
Total_efficiency: 89.67
Mitochondrial_reads: 18
NRF: 1.0
PBC1: 1.0
PBC2: 448366.0
Unmapped_reads: 63
Duplicate_reads: '0'
Dedup_aligned_reads: 896742.0
Dedup_alignment_rate: 89.67
Dedup_total_efficiency: 89.67
NFR_frac: 0.3593
mono_frac: 0.2362
di_frac: 0.0647
tri_frac: 0.0014
poly_frac: 0.0013
Read_length: 42
Genome_size: 3099922541
Library complexity: QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png
PEPATAC
Frac_exp_unique_at_10M: 0.9585
TSS_score: 14.2
TSS enrichment: QC_hg38/tutorial1_TSS_enrichment.pdf TSS enrichment QC_hg38/tutorial1_TSS_enrichment.png
PEPATAC
Fragment distribution: QC_hg38/tutorial1_fragLenDistribution.pdf Fragment distribution
QC_hg38/tutorial1_fragLenDistribution.png PEPATAC
Peak_count: 549875
FRiP: 0.93
Peak chromosome distribution: QC_hg38/tutorial1_peak_chromosome_distribution.pdf
Peak chromosome distribution QC_hg38/tutorial1_peak_chromosome_distribution.png
PEPATAC
TSS distance distribution: QC_hg38/tutorial1_peak_TSS_distribution.pdf TSS distance
distribution QC_hg38/tutorial1_peak_TSS_distribution.png PEPATAC
Peak partition distribution: QC_hg38/tutorial1_peak_genomic_distribution.pdf
Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.png
PEPATAC
cFRiF: QC_hg38/tutorial1_cFRiF.pdf cFRiF QC_hg38/tutorial1_cFRiF.png PEPATAC
FRiF: QC_hg38/tutorial1_FRiF.pdf FRiF QC_hg38/tutorial1_FRiF.png PEPATAC
Time: 0:04:33
Success: 11-17-15:29:14
vs previous method which output a stats.tsv
and an objects.tsv
:
FastQC report r1 fastq/tutorial1_R1_trim_fastqc.html FastQC report r1 None PEPATAC
FastQC report r2 fastq/tutorial1_R2_trim_fastqc.html FastQC report r2 None PEPATAC
Library complexity QC_hg38/tutorial1_preseq_plot.pdf Library complexity QC_hg38/tutorial1_preseq_plot.png PEPATAC
TSS enrichment QC_hg38/tutorial1_TSS_enrichment.pdf TSS enrichment QC_hg38/tutorial1_TSS_enrichment.png PEPATAC
Fragment distribution QC_hg38/tutorial1_fragLenDistribution.pdf Fragment distribution QC_hg38/tutorial1_fragLenDistribution.png PEPATAC
Peak chromosome distribution QC_hg38/tutorial1_peak_chromosome_distribution.pdf Peak chromosome distribution QC_hg38/tutorial1_peak_chromosome_distribution.png PEPATAC
TSS distance distribution QC_hg38/tutorial1_peak_TSS_distribution.pdf TSS distance distribution QC_hg38/tutorial1_peak_TSS_distribution.png PEPATAC
Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.pdf Peak partition distribution QC_hg38/tutorial1_peak_genomic_distribution.png PEPATAC
cFRiF QC_hg38/tutorial1_cFRiF.pdf cFRiF QC_hg38/tutorial1_cFRiF.png PEPATAC
FRiF QC_hg38/tutorial1_FRiF.pdf FRiF QC_hg38/tutorial1_FRiF.png PEPATAC
These changes occurred in pypiper v0.13.0 when the reporting backend was transitioned to pipestat.
Related to this issue, unable to resolve pipestat's results.yaml
in the looper config file without inputting it as an absolute path after it has been created by pypiper.
This is because pypiper will create a stats.yaml results file if pipestat_results_file
is not given to pypiper as an input parameter.
Current workaround is to add the actual path after the pipeline has run, simply so that it can be used for looper report
and looper link
functionality.
name: PEPATAC_tutorial
pep_config: tutorial_refgenie_project_config.yaml
output_dir: "${TUTORIAL}/processed/"
pipeline_interfaces:
sample: ["${TUTORIAL}/tools/pepatac/sample_pipeline_interface.yaml"]
project: ["${TUTORIAL}/tools/pepatac/project_pipeline_interface.yaml"]
pipestat:
results_file_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/stats.yaml
#results_file_path: "${TUTORIAL}/processed/results_pipeline/{sample_name}/stats.yaml" # This does not work
Changed pypipers report_object
to create a values dict that conforms with a pipestat output_schema
for complex objects:
https://github.com/databio/pypiper/commit/1a677dad34ffe77dd14cff1034d02e9fde09c117
An example of PEPATAC reported results after the change:
TSS enrichment:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.png
title: TSS enrichment
annotation: PEPATAC
vs before:
TSS enrichment: QC_hg38/tutorial1_TSS_enrichment.pdf TSS enrichment QC_hg38/tutorial1_TSS_enrichment.png
PEPATAC
The results file output should now match the output_schema and allow for proper report generation. However, the object pages are still showing up blank:
I found the issue; the key
value being reported did not match the output schema for the complex types,e.g.:
Pepatac reports "Library complexity"
but the output_schema's key is library_complexity
.
I manually edited the stats.yaml
file and confirmed that this does solve the issue:
There are also many items reported by the pipeline that are not in the output schema and, therefore, these results are not captured in the html report.
This will work in PR #255:
However, it is not 100% because:
PEPATAC:
project: {}
sample:
tutorial1:
File_mb: 27
pipestat_created_time: '2023-11-27 13:54:39'
pipestat_modified_time: '2023-11-27 13:59:04'
Read_type: paired
Genome: hg38
Raw_reads: '1000000'
Fastq_reads: 1000000
Trimmed_reads: 1000000
Trim_loss_rate: 0.0
FastQC report r1:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R1_trim_fastqc.html
thumbnail_path: null
title: FastQC report r1
annotation: PEPATAC
FastQC report r2:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R2_trim_fastqc.html
thumbnail_path: null
title: FastQC report r2
annotation: PEPATAC
Aligned_reads_rCRSd: 99360.0
Alignment_rate_rCRSd: 9.94
Mapped_reads: '900577'
QC_filtered_reads: 3835
Aligned_reads: '896742'
Alignment_rate: 89.67
Total_efficiency: 89.67
Mitochondrial_reads: 18
NRF: 1.0
PBC1: 1.0
PBC2: 448366.0
Unmapped_reads: 63
Duplicate_reads: '0'
Dedup_aligned_reads: 896742.0
Dedup_alignment_rate: 89.67
Dedup_total_efficiency: 89.67
NFR_frac: 0.3593
mono_frac: 0.2362
di_frac: 0.0647
tri_frac: 0.0014
poly_frac: 0.0013
Read_length: 42
Genome_size: 3099922541
Library complexity:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_preseq_plot.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_preseq_plot.png
title: Library complexity
annotation: PEPATAC
Frac_exp_unique_at_10M: 0.9585
TSS_score: 14.2
TSS enrichment:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_TSS_enrichment.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_TSS_enrichment.png
title: TSS enrichment
annotation: PEPATAC
Fragment distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_fragLenDistribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_fragLenDistribution.png
title: Fragment distribution
annotation: PEPATAC
Peak_count: 549875
FRiP: 0.93
Peak chromosome distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_chromosome_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_chromosome_distribution.png
title: Peak chromosome distribution
annotation: PEPATAC
TSS distance distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_TSS_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_TSS_distribution.png
title: TSS distance distribution
annotation: PEPATAC
Peak partition distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_genomic_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_peak_genomic_distribution.png
title: Peak partition distribution
annotation: PEPATAC
cFRiF:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_cFRiF.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_cFRiF.png
title: cFRiF
annotation: PEPATAC
FRiF:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_FRiF.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/QC_hg38/tutorial1_FRiF.png
title: FRiF
annotation: PEPATAC
Time: 0:04:25
Success: 11-27-13:59:04
tutorial2:
File_mb: 27
pipestat_created_time: '2023-11-27 13:59:05'
pipestat_modified_time: '2023-11-27 14:03:28'
Read_type: paired
Genome: hg38
Raw_reads: '1000000'
Fastq_reads: 1000000
Trimmed_reads: 1000000
Trim_loss_rate: 0.0
FastQC report r1:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R1_trim_fastqc.html
thumbnail_path: null
title: FastQC report r1
annotation: PEPATAC
FastQC report r2:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R2_trim_fastqc.html
thumbnail_path: null
title: FastQC report r2
annotation: PEPATAC
Aligned_reads_rCRSd: 100556.0
Alignment_rate_rCRSd: 10.06
Mapped_reads: '899373'
QC_filtered_reads: 4021
Aligned_reads: '895352'
Alignment_rate: 89.54
Total_efficiency: 89.54
Mitochondrial_reads: 30
NRF: 1.0
PBC1: 1.0
PBC2: 447669.0
Unmapped_reads: 71
Duplicate_reads: '0'
Dedup_aligned_reads: 895352.0
Dedup_alignment_rate: 89.54
Dedup_total_efficiency: 89.54
NFR_frac: 0.3602
mono_frac: 0.2354
di_frac: 0.0643
tri_frac: 0.0015
poly_frac: 0.0014
Read_length: 42
Genome_size: 3099922541
TSS_score: 12.8
TSS enrichment:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_TSS_enrichment.png
title: TSS enrichment
annotation: PEPATAC
Fragment distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_fragLenDistribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_fragLenDistribution.png
title: Fragment distribution
annotation: PEPATAC
Peak_count: 548852
FRiP: 0.93
Peak chromosome distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_chromosome_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_chromosome_distribution.png
title: Peak chromosome distribution
annotation: PEPATAC
TSS distance distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_TSS_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_TSS_distribution.png
title: TSS distance distribution
annotation: PEPATAC
Peak partition distribution:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_genomic_distribution.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_peak_genomic_distribution.png
title: Peak partition distribution
annotation: PEPATAC
cFRiF:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_cFRiF.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_cFRiF.png
title: cFRiF
annotation: PEPATAC
FRiF:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_FRiF.pdf
thumbnail_path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/QC_hg38/tutorial2_FRiF.png
title: FRiF
annotation: PEPATAC
Time: 0:04:23
Success: 11-27-14:03:28
title: An example Pipestat output schema
description: objects produced by PEPATAC pipeline.
type: object
properties:
pipeline_name: PEPATAC
samples:
type: object
properties:
smooth_bw:
type: string
description: "Smoothed signal track"
exact_bw:
type: string
description: "Nucleotide-resolution signal track"
aligned_bam:
type: string
description: "Coordinate sorted deduplicated, aligned BAM file"
peak_file:
type: string
description: "Sample peak file"
coverage_file:
type: string
description: "Sample peak coverage table"
summits_bed:
type: string
description: "Peak summit file"
project:
type: object
properties:
alignment_percent_file:
title: "Alignment percent file"
description: "Plots percent of total alignment to all pre-alignments and primary genome."
type: object
object_type: image
properties:
path:
type: string
thumbnail_path:
type: string
title:
type: string
required:
- path
- thumbnail_path
- title
alignment_raw_file:
title: "Alignment raw file"
description: "Plots raw alignment rates to all pre-alignments and primary genome."
type: object
object_type: image
properties:
path:
type: string
thumbnail_path:
type: string
title:
type: string
required:
- path
- thumbnail_path
- title
tss_file:
title: "TSS enrichment file"
description: "Plots TSS scores for each sample."
type: object
object_type: image
properties:
path:
type: string
thumbnail_path:
type: string
title:
type: string
required:
- path
- thumbnail_path
- title
library_complexity_file:
title: "Library complexity file"
description: "Plots each sample's library complexity on a single plot."
type: object
object_type: image
properties:
path:
type: string
thumbnail_path:
type: string
title:
type: string
required:
- path
- thumbnail_path
- title
consensus_peaks_file:
title: "consesus peak file"
description: "A set of consensus peaks across samples."
type: object
object_type: file
properties:
path:
type: string
title:
type: string
thumbnail_path:
type: string
required:
- path
- title
counts_table:
title: "Project peak coverage file"
description: "Project peak coverages: chr_start_end X sample"
type: object
object_type: file
properties:
path:
type: string
title:
type: string
thumbnail_path:
type: string
required:
- path
- title
Ok, I've added the reported outputs to the output schema in dev_test_pipestat
and it now works much better when building the report.
The current tutorial claims that html reports can be created using
looper report
. However, with Looper 1.5.0 and greater, pipestat configuration is required to use that function. The tutorial does not mention this.I have a new branch to update the documents. However, I noticed that, even after configuring looper to use pipestat, the generated html report is blank.
Next steps: