Closed donaldcampbelljr closed 10 months ago
Currently, per the tutorial, pepatac will create a separate stats.yaml
for each of the input samples.
results_pipeline
|__Tutorial1
|___stats.yaml
|__Tutorial2
|___stats.yaml
This is problematic for using pipestat in the looper_config file which is necessary for looper report
and looper link
.
This is because we can currently only choose one pipestat results file in the looper config.
Spawning separate stats files is default pypiper behavior that can be overridden using the pipestat_results_file
parameter.
This allows for specifiying a single results file for the pipeline output:
PEPATAC:
project: {}
sample:
tutorial1:
File_mb: 27
pipestat_created_time: '2023-11-20 16:56:32'
pipestat_modified_time: '2023-11-20 16:56:44'
Read_type: paired
Genome: hg38
Raw_reads: '1000000'
Fastq_reads: 1000000
Trimmed_reads: 1000000
FastQC report r1:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R1_trim_fastqc.html
thumbnail_path: null
title: FastQC report r1
annotation: PEPATAC
FastQC report r2:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial1/fastq/tutorial1_R2_trim_fastqc.html
thumbnail_path: null
title: FastQC report r2
annotation: PEPATAC
Aligned_reads_rCRSd: 99360.0
Alignment_rate_rCRSd: 9.94
tutorial2:
File_mb: 27
pipestat_created_time: '2023-11-20 16:58:02'
pipestat_modified_time: '2023-11-20 16:58:12'
Read_type: paired
Genome: hg38
Raw_reads: '1000000'
Fastq_reads: 1000000
Trimmed_reads: 1000000
FastQC report r1:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R1_trim_fastqc.html
thumbnail_path: null
title: FastQC report r1
annotation: PEPATAC
FastQC report r2:
path: /home/drc/pepatac_tutorial/tools/pepatac/examples/tutorial/home/drc/pepatac_tutorial/processed/results_pipeline/tutorial2/fastq/tutorial2_R2_trim_fastqc.html
thumbnail_path: null
title: FastQC report r2
annotation: PEPATAC
Aligned_reads_rCRSd: 100556.0
Alignment_rate_rCRSd: 10.06
This works well until the pipeline attempts to retrieve a stat via pm.get_stat
. When it attempts to retrieve a result from a file that contains more than one samples, it errors.
Missing stat 'Raw_reads'
Traceback (most recent call last):
File "/home/drc/pepatac_tutorial//tools/pepatac/pipelines/pepatac.py", line 2784, in <module>
sys.exit(main())
File "/home/drc/pepatac_tutorial//tools/pepatac/pipelines/pepatac.py", line 1117, in main
pm.run([cmd, cmd2], rmdup_bam, follow=check_alignment_genome)
File "/home/drc/GITHUB/pepatac/pepatac/venv/lib/python3.10/site-packages/pypiper/manager.py", line 1093, in run
call_follow()
File "/home/drc/GITHUB/pepatac/pepatac/venv/lib/python3.10/site-packages/pypiper/manager.py", line 947, in call_follow
follow()
File "/home/drc/pepatac_tutorial//tools/pepatac/pipelines/pepatac.py", line 1106, in check_alignment_genome
rr = float(pm.get_stat("Raw_reads"))
TypeError: float() argument must be a string or a real number, not 'NoneType'
I believe the solution is to have pypiper instead use pipestat's retrieve_one
. Perhaps get_stat
can be a wrapper for this.
Solution was implemented in pypiper: https://github.com/databio/pypiper/issues/202#issuecomment-1828469016
Related, pipestat was modified to create subdirectories during result_file_path creation: https://github.com/pepkit/pipestat/commit/76d79d915ab90ab763c58d9ac74c80dcdfb0d74d
This is because pypiper will create a stats.yaml results file if
pipestat_results_file
is not given to pypiper as an input parameter.Current workaround is to add the actual path after the pipeline has run, simply so that it can be used for
looper report
andlooper link
functionality.Originally posted by @donaldcampbelljr in https://github.com/databio/pepatac/issues/256#issuecomment-1819628165