databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 14 forks source link

TSS enrichment taking a long time to process #197

Closed badusername21 closed 2 years ago

badusername21 commented 3 years ago

Previously, we had not been using the TSS enrichment process while processing our samples, but after introducing this aspect into the processing, it appears as though whenever we run a sample it stops at the TSS enrichment part and does not appear to move on from it even after giving it a couple days for a single sample.

Pipeline run code and environment:

Version log:

Arguments passed to pipeline:


Local input file: ../ATAC/305A_L000_R1_001.fastq.gz Local input file: ../ATAC/305A_L000_R2_001.fastq.gz

File_mb 650 2 RES

Read_type paired PEPATAC RES

Genome GRCm38_with_viruses_and_spikes PEPATAC RES

Merge/link and fastq conversion: (06-24 13:11:55) elapsed: 0.0 TIME

Number of input file sets: 2 Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R1.fastq.gz

ln -sf /varidata/research/projects/immunograph/FASTQs/ATAC/305A_L000_R1_001.fastq.gz /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R1.fastq.gz (215206)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.
PID: 215206; Command: ln; Return code: 0; Memory used: 0.0GB

Local input file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R1.fastq.gz' Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R2.fastq.gz

ln -sf /varidata/research/projects/immunograph/FASTQs/ATAC/305A_L000_R2_001.fastq.gz /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R2.fastq.gz (215207)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.
PID: 215207; Command: ln; Return code: 0; Memory used: 0.0GB

Local input file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R2.fastq.gz' Found .fastq.gz file Found .fq.gz file; no conversion necessary Found .fastq.gz file Found .fq.gz file; no conversion necessary Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1.fastq.gz,/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2.fastq.gz

ln -sf /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R1.fastq.gz /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1.fastq.gz (215208)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.
PID: 215208; Command: ln; Return code: 0; Memory used: 0.0GB

ln -sf /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/raw/305A_R2.fastq.gz /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2.fastq.gz (215209)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0GB.
PID: 215209; Command: ln; Return code: 0; Memory used: 0.0GB

Raw_reads 19963166 PEPATAC RES

Fastq_reads 19963166 PEPATAC RES

Adapter trimming: (06-24 13:12:21) elapsed: 26.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim.fastq

skewer -f sanger -t 16 -m pe -x /varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/NexteraPE-PE.fa --quiet -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1.fastq.gz /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2.fastq.gz (120787)

.--. .-.
: .--': :.-.
`. `. : `'.' .--. .-..-..-. .--. .--.
_`, :: . `.' '_.': `; `; :' '_.': ..'
`.__.':_;:_;`.__.'`.__.__.'`.__.':_;
skewer v0.2.2 [April 4, 2016]
Parameters used:
-- 3' end adapter sequences in file (-x):   /varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/NexteraPE-PE.fa
A:  AGATGTGTATAAGAGACAG
B:  AGATGTGTATAAGAGACAG
C:  TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
D:  CTGTCTCTTATACACATCTGACGCTGCCGACGA
E:  GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
F:  CTGTCTCTTATACACATCTCCGAGCCCACGAGA
-- maximum error ratio allowed (-r):    0.100
-- maximum indel error ratio allowed (-d):  0.030
-- minimum read length allowed after trimming (-l): 18
-- file format (-f):        Sanger/Illumina 1.8+ FASTQ 
-- number of concurrent threads (-t):   16
Thu Jun 24 13:12:21 2021 >> started

Thu Jun 24 13:12:55 2021 >> done (34.586s) 9981583 read pairs processed; of these: 87 ( 0.00%) short read pairs filtered out after trimming by size control 67 ( 0.00%) empty read pairs filtered out after trimming by size control 9981429 (100.00%) read pairs available; of these: 910421 ( 9.12%) trimmed read pairs available after processing 9071008 (90.88%) untrimmed read pairs available after processing log has been saved to "/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A-trimmed.log". Command completed. Elapsed time: 0:00:35. Running peak memory: 0.026GB.
PID: 120787; Command: skewer; Return code: 0; Memory used: 0.026GB

mv /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A-trimmed-pair1.fastq /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim.fastq (120824)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0.026GB.
PID: 120824; Command: mv; Return code: 0; Memory used: 0.0GB

mv /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A-trimmed-pair2.fastq /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2_trim.fastq (120825)


Command completed. Elapsed time: 0:00:00. Running peak memory: 0.026GB.
PID: 120825; Command: mv; Return code: 0; Memory used: 0.0GB

Evaluating read trimming

Trimmed_reads 19962858 PEPATAC RES

Trim_loss_rate 0.0 PEPATAC RES Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim_fastqc.html

fastqc --noextract --outdir /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastqc /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim.fastq (120830)

Started analysis of 305A_R1_trim.fastq
Approx 5% complete for 305A_R1_trim.fastq
Approx 10% complete for 305A_R1_trim.fastq
Approx 15% complete for 305A_R1_trim.fastq
Approx 20% complete for 305A_R1_trim.fastq
Approx 25% complete for 305A_R1_trim.fastq
Approx 30% complete for 305A_R1_trim.fastq
Approx 35% complete for 305A_R1_trim.fastq
Approx 40% complete for 305A_R1_trim.fastq
Approx 45% complete for 305A_R1_trim.fastq
Approx 50% complete for 305A_R1_trim.fastq
Approx 55% complete for 305A_R1_trim.fastq
Approx 60% complete for 305A_R1_trim.fastq
Approx 65% complete for 305A_R1_trim.fastq
Approx 70% complete for 305A_R1_trim.fastq
Approx 75% complete for 305A_R1_trim.fastq
Approx 80% complete for 305A_R1_trim.fastq
Approx 85% complete for 305A_R1_trim.fastq
Approx 90% complete for 305A_R1_trim.fastq
Approx 95% complete for 305A_R1_trim.fastq
Analysis complete for 305A_R1_trim.fastq

Command completed. Elapsed time: 0:00:28. Running peak memory: 0.185GB.
PID: 120830; Command: fastqc; Return code: 0; Memory used: 0.185GB

FastQC report r1 fastq/305A_R1_trim_fastqc.html FastQC report r1 None PEPATAC OBJ Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2_trim_fastqc.html

fastqc --noextract --outdir /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastqc /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2_trim.fastq (120885)

Started analysis of 305A_R2_trim.fastq
Approx 5% complete for 305A_R2_trim.fastq
Approx 10% complete for 305A_R2_trim.fastq
Approx 15% complete for 305A_R2_trim.fastq
Approx 20% complete for 305A_R2_trim.fastq
Approx 25% complete for 305A_R2_trim.fastq
Approx 30% complete for 305A_R2_trim.fastq
Approx 35% complete for 305A_R2_trim.fastq
Approx 40% complete for 305A_R2_trim.fastq
Approx 45% complete for 305A_R2_trim.fastq
Approx 50% complete for 305A_R2_trim.fastq
Approx 55% complete for 305A_R2_trim.fastq
Approx 60% complete for 305A_R2_trim.fastq
Approx 65% complete for 305A_R2_trim.fastq
Approx 70% complete for 305A_R2_trim.fastq
Approx 75% complete for 305A_R2_trim.fastq
Approx 80% complete for 305A_R2_trim.fastq
Approx 85% complete for 305A_R2_trim.fastq
Approx 90% complete for 305A_R2_trim.fastq
Approx 95% complete for 305A_R2_trim.fastq
Analysis complete for 305A_R2_trim.fastq

Command completed. Elapsed time: 0:00:27. Running peak memory: 0.185GB.
PID: 120885; Command: fastqc; Return code: 0; Memory used: 0.183GB

FastQC report r2 fastq/305A_R2_trim_fastqc.html FastQC report r2 None PEPATAC OBJ

Prealignments (06-24 13:13:52) elapsed: 91.0 TIME

You may use --prealignments to align to references before the genome alignment step. See docs.

Compress all unmapped read files (06-24 13:13:52) elapsed: 0.0 TIME

Map to genome (06-24 13:13:52) elapsed: 0.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam

bwa mem -t 16 -M /varidata/research/projects/immunograph/refgenie/alias/GRCm38_with_viruses_and_spikes/bwa_index/default/GRCm38_with_viruses_and_spikes.fa /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim.fastq /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2_trim.fastq | samtools view -bS - -@ 1 | samtools sort - -@ 1 -T /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/tmpirf0cydw -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam (120932,120933,120934)

[M::bwa_idx_load_from_disk] read 0 ALT contigs
[M::process] read 3183678 sequences (160000018 bp)...
[M::process] read 3185068 sequences (160000070 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (46, 1184763, 18, 31)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (36, 60, 180)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 468)
[M::mem_pestat] mean and std.dev: (103.00, 92.98)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 612)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (75, 145, 234)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 552)
[M::mem_pestat] mean and std.dev: (169.94, 116.48)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 711)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (92, 257, 707)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1937)
[M::mem_pestat] mean and std.dev: (451.18, 549.15)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2648)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (54, 109, 242)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 618)
[M::mem_pestat] mean and std.dev: (162.52, 169.21)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 839)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 3183678 reads in 781.427 CPU sec, 49.588 real sec
[M::process] read 3184252 sequences (160000010 bp)...
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (49, 1183562, 9, 36)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (36, 66, 122)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 294)
[M::mem_pestat] mean and std.dev: (72.28, 49.97)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 380)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (74, 143, 232)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 548)
[M::mem_pestat] mean and std.dev: (167.78, 115.01)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 706)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (52, 115, 237)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 607)
[M::mem_pestat] mean and std.dev: (153.15, 150.78)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 792)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RR
[M::mem_process_seqs] Processed 3185068 reads in 803.480 CPU sec, 50.791 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (42, 1186338, 17, 31)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (40, 82, 218)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 574)
[M::mem_pestat] mean and std.dev: (128.00, 127.49)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 752)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (75, 145, 234)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 552)
[M::mem_pestat] mean and std.dev: (169.52, 116.43)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 711)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (165, 388, 471)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1083)
[M::mem_pestat] mean and std.dev: (286.33, 215.57)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1389)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (104, 184, 272)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 608)
[M::mem_pestat] mean and std.dev: (171.35, 108.10)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 776)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::process] read 3184392 sequences (160000032 bp)...
[M::mem_process_seqs] Processed 3184252 reads in 802.856 CPU sec, 51.017 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (40, 1182276, 12, 29)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (47, 96, 202)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 512)
[M::mem_pestat] mean and std.dev: (111.81, 77.90)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 667)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (74, 142, 229)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 539)
[M::mem_pestat] mean and std.dev: (166.22, 112.89)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 694)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (35, 372, 2021)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 5993)
[M::mem_pestat] mean and std.dev: (777.75, 1133.30)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 7979)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (50, 104, 199)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 497)
[M::mem_pestat] mean and std.dev: (136.56, 123.58)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 646)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::process] read 3184150 sequences (160000096 bp)...
[M::mem_process_seqs] Processed 3184392 reads in 799.537 CPU sec, 50.975 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (50, 1182392, 14, 38)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (50, 76, 210)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 530)
[M::mem_pestat] mean and std.dev: (118.40, 91.00)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 690)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (74, 143, 230)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 542)
[M::mem_pestat] mean and std.dev: (166.83, 113.72)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 698)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (77, 388, 448)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 1190)
[M::mem_pestat] mean and std.dev: (260.86, 199.67)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 1561)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (52, 105, 280)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 736)
[M::mem_pestat] mean and std.dev: (163.26, 166.58)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 964)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::process] read 3183488 sequences (160000064 bp)...
[M::mem_process_seqs] Processed 3184150 reads in 802.605 CPU sec, 51.509 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (50, 1184026, 13, 21)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (57, 107, 192)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 462)
[M::mem_pestat] mean and std.dev: (127.83, 94.04)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 597)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (75, 145, 233)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 549)
[M::mem_pestat] mean and std.dev: (168.94, 115.44)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 707)
[M::mem_pestat] analyzing insert size distribution for orientation RF...
[M::mem_pestat] (25, 50, 75) percentile: (180, 309, 803)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 2049)
[M::mem_pestat] mean and std.dev: (475.38, 396.13)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 2672)
[M::mem_pestat] analyzing insert size distribution for orientation RR...
[M::mem_pestat] (25, 50, 75) percentile: (55, 106, 177)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 421)
[M::mem_pestat] mean and std.dev: (111.55, 66.35)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 543)
[M::mem_pestat] skip orientation FF
[M::mem_pestat] skip orientation RF
[M::mem_pestat] skip orientation RR
[M::process] read 857830 sequences (43089612 bp)...
[M::mem_process_seqs] Processed 3183488 reads in 795.390 CPU sec, 50.817 real sec
[M::mem_pestat] # candidate unique pairs for (FF, FR, RF, RR): (22, 317946, 3, 9)
[M::mem_pestat] analyzing insert size distribution for orientation FF...
[M::mem_pestat] (25, 50, 75) percentile: (40, 94, 222)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 586)
[M::mem_pestat] mean and std.dev: (126.00, 99.92)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 768)
[M::mem_pestat] analyzing insert size distribution for orientation FR...
[M::mem_pestat] (25, 50, 75) percentile: (74, 140, 228)
[M::mem_pestat] low and high boundaries for computing mean and std.dev: (1, 536)
[M::mem_pestat] mean and std.dev: (164.95, 112.34)
[M::mem_pestat] low and high boundaries for proper pairs: (1, 690)
[M::mem_pestat] skip orientation RF as there are not enough pairs
[M::mem_pestat] skip orientation RR as there are not enough pairs
[M::mem_pestat] skip orientation FF
[M::mem_process_seqs] Processed 857830 reads in 219.052 CPU sec, 14.124 real sec
[main] Version: 0.7.17-r1188
[main] CMD: bwa mem -t 16 -M /varidata/research/projects/immunograph/refgenie/alias/GRCm38_with_viruses_and_spikes/bwa_index/default/GRCm38_with_viruses_and_spikes.fa /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R1_trim.fastq /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/fastq/305A_R2_trim.fastq
[main] Real time: 346.808 sec; CPU: 5010.916 sec
[bam_sort_core] merging from 5 files and 1 in-memory blocks...

Command completed. Elapsed time: 0:07:34. Running peak memory: 13.199GB.
PID: 120932; Command: bwa; Return code: 0; Memory used: 13.199GB
PID: 120933; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 120934; Command: samtools; Return code: 0; Memory used: 0.889GB

samtools view -b -q 10 -@ 16 -U /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_fail_qc.bam -f 2 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121458)


Command completed. Elapsed time: 0:00:30. Running peak memory: 13.199GB.
PID: 121458; Command: samtools; Return code: 0; Memory used: 0.035GB

Mapped_reads 19815395 PEPATAC RES

QC_filtered_reads 2264644 PEPATAC RES

Aligned_reads 17550751 PEPATAC RES

Alignment_rate 87.92 PEPATAC RES

Total_efficiency 87.92 PEPATAC RES Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam.bai

samtools index /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam (121515)


Command completed. Elapsed time: 0:00:12. Running peak memory: 13.199GB.
PID: 121515; Command: samtools; Return code: 0; Memory used: 0.008GB

samtools index /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121518)


Command completed. Elapsed time: 0:00:11. Running peak memory: 13.199GB.
PID: 121518; Command: samtools; Return code: 0; Memory used: 0.009GB

Missing stat 'Mitochondrial_reads'

samtools idxstats /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam | grep -we 'chrM' -we 'ChrM' -we 'ChrMT' -we 'chrMT' -we 'M' -we 'MT' -we 'rCRSd'| cut -f 3

Mitochondrial_reads 1772654 PEPATAC RES Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_noMT.bam

samtools idxstats /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam | cut -f 1-2 | awk '{print $1, 0, $2}' | grep -vwe 'chrM' -vwe 'ChrM' -vwe 'ChrMT' -vwe 'chrMT' -vwe 'M' -vwe 'MT' -vwe 'rCRSd' > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/chr_sizes.bed (121537,121538,121539,121540)


Command completed. Elapsed time: 0:00:00. Running peak memory: 13.199GB.
PID: 121537; Command: samtools; Return code: 0; Memory used: 0.0GB
PID: 121539; Command: awk; Return code: 0; Memory used: 0.0GB
PID: 121538; Command: cut; Return code: 0; Memory used: 0.0GB
PID: 121540; Command: grep; Return code: 0; Memory used: 0.0GB

samtools view -L /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/chr_sizes.bed -b -@ 16 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_noMT.bam (121542)


Command completed. Elapsed time: 0:00:16. Running peak memory: 13.199GB.
PID: 121542; Command: samtools; Return code: 0; Memory used: 0.028GB

mv /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_noMT.bam /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121564)


Command completed. Elapsed time: 0:00:00. Running peak memory: 13.199GB.
PID: 121564; Command: mv; Return code: 0; Memory used: 0.0GB

samtools index /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121565)


Command completed. Elapsed time: 0:00:10. Running peak memory: 13.199GB.
PID: 121565; Command: samtools; Return code: 0; Memory used: 0.008GB

Calculate NRF, PBC1, and PBC2 (06-24 13:23:08) elapsed: 556.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv

/varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/bamQC.py -i /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam -c 16 -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv (121582)

Registering input file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam'
Temporary files will be stored in: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/tmp_305A_sort_stcq0toa'
Processing with 16 cores...
Merging 58 files into output file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv'

Command completed. Elapsed time: 0:00:08. Running peak memory: 13.199GB.
PID: 121582; Command: /varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/bamQC.py; Return code: 0; Memory used: 2.983GB

awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "NRF") c=i } getline; print $c }' /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv

awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "PBC1") c=i } getline; print $c }' /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv

awk '{ for (i=1; i<=NF; ++i) { if ($i ~ "PBC2") c=i } getline; print $c }' /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_bamQC.tsv

NRF 0.7 PEPATAC RES

PBC1 0.83 PEPATAC RES

PBC2 5.87 PEPATAC RES Missing stat 'Unmapped_reads' Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_unmap.bam

samtools view -b -@ 16 -f 12 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_unmap.bam (121646)


Command completed. Elapsed time: 0:00:02. Running peak memory: 13.199GB.
PID: 121646; Command: samtools; Return code: 0; Memory used: 0.008GB

samtools view -c -f 4 -@ 16 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_temp.bam

Unmapped_reads 147606 PEPATAC RES

Remove duplicates and produce signal tracks (06-24 13:23:20) elapsed: 12.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam

samtools sort -n -@ 4 -T /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/tmp05jujci2 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam | samtools view -h - -@ 4 | samblaster -r --ignoreUnmated 2> /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_dedup_metrics_log.txt | samtools view -b - -@ 4 | samtools sort - -@ 4 -T /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/tmp05jujci2 -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam (121684,121685,121691,121692,121694)

[bam_sort_core] merging from 4 files and 4 in-memory blocks...
[bam_sort_core] merging from 0 files and 4 in-memory blocks...

Command completed. Elapsed time: 0:01:22. Running peak memory: 13.199GB.
PID: 121692; Command: samtools; Return code: 0; Memory used: 0.005GB
PID: 121684; Command: samtools; Return code: 0; Memory used: 3.705GB
PID: 121691; Command: samblaster; Return code: 0; Memory used: 0.07GB
PID: 121685; Command: samtools; Return code: 0; Memory used: 0.004GB
PID: 121694; Command: samtools; Return code: 0; Memory used: 3.178GB

samtools index /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam (121785)


Command completed. Elapsed time: 0:00:09. Running peak memory: 13.199GB.
PID: 121785; Command: samtools; Return code: 0; Memory used: 0.009GB

grep 'Removed' /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_dedup_metrics_log.txt | tr -s ' ' | cut -f 3 -d ' '

Duplicate_reads 1357368 PEPATAC RES

Dedup_aligned_reads 16193383.0 PEPATAC RES

Dedup_alignment_rate 81.12 PEPATAC RES

Dedup_total_efficiency 81.12 PEPATAC RES

Calculate distribution of reads across nucleosomes (06-24 13:24:50) elapsed: 91.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_NFR.bam

samtools view -h /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam | awk '(substr($0,1,1)=="@" || ($9>= -100 && $9<=100))' | samtools view -b > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_NFR.bam (121790,121791,121792)


Command completed. Elapsed time: 0:00:27. Running peak memory: 13.199GB.
PID: 121791; Command: awk; Return code: 0; Memory used: 0.001GB
PID: 121790; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 121792; Command: samtools; Return code: 0; Memory used: 0.013GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_mono.bam

samtools view -h /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam | awk '(substr($0,1,1)=="@" || ($9>= 180 && $9<=247) || ($9<=-180 && $9>=-247))' | samtools view -b > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_mono.bam (121820,121821,121822)


Command completed. Elapsed time: 0:00:17. Running peak memory: 13.199GB.
PID: 121820; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 121822; Command: samtools; Return code: 0; Memory used: 0.013GB
PID: 121821; Command: awk; Return code: 0; Memory used: 0.001GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_di.bam

samtools view -h /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam | awk '(substr($0,1,1)=="@" || ($9>= 315 && $9<=473) || ($9<=-315 && $9>=-473))' | samtools view -b > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_di.bam (121826,121827,121828)


Command completed. Elapsed time: 0:00:17. Running peak memory: 13.199GB.
PID: 121826; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 121828; Command: samtools; Return code: 0; Memory used: 0.013GB
PID: 121827; Command: awk; Return code: 0; Memory used: 0.001GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_tri.bam

samtools view -h /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam | awk '(substr($0,1,1)=="@" || ($9>= 558 && $9<=615) || ($9<=-558 && $9>=-615))' | samtools view -b > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_tri.bam (121844,121845,121846)


Command completed. Elapsed time: 0:00:19. Running peak memory: 13.199GB.
PID: 121844; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 121846; Command: samtools; Return code: 0; Memory used: 0.013GB
PID: 121845; Command: awk; Return code: 0; Memory used: 0.001GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_poly.bam

samtools view -h /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam | awk '(substr($0,1,1)=="@" || ($9>= 615 || $9<=-615))' | samtools view -b > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_poly.bam (121865,121866,121867)


Command completed. Elapsed time: 0:00:17. Running peak memory: 13.199GB.
PID: 121865; Command: samtools; Return code: 0; Memory used: 0.003GB
PID: 121867; Command: samtools; Return code: 0; Memory used: 0.011GB
PID: 121866; Command: awk; Return code: 0; Memory used: 0.001GB

samtools view -c /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_NFR.bam

samtools view -c /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_mono.bam

samtools view -c /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_di.bam

samtools view -c /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_tri.bam

samtools view -c /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_poly.bam

NFR_frac 0.2893 PEPATAC RES

mono_frac 0.1446 PEPATAC RES

di_frac 0.1007 PEPATAC RES

tri_frac 0.0088 PEPATAC RES

poly_frac 0.0057 PEPATAC RES Missing stat 'Read_length'

samtools stats /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam | grep '^SN' | cut -f 2- | grep 'maximum length:' | cut -f 2-

Read_length 51 PEPATAC RES Missing stat 'Genome_size'

awk '{sum+=$2} END {printf "%.0f", sum}' /varidata/research/projects/immunograph/refgenie/alias/GRCm38_with_viruses_and_spikes/fasta/default/GRCm38_with_viruses_and_spikes.chrom.sizes

Genome_size 2731587773 PEPATAC RES

Calculate library complexity (06-24 13:26:58) elapsed: 128.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_out.txt

preseq c_curve -v -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_out.txt -B /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121903)

BAM_INPUT
TOTAL READS     = 8058033
COUNTS_SUM      = 8058033
DISTINCT READS  = 6.36302e+06
DISTINCT COUNTS = 151
MAX COUNT       = 887
COUNTS OF 1     = 5.09398e+06
OBSERVED COUNTS (888)
1   5093979
2   1002422
3   195431
4   44954
5   13707
6   5571
7   2629
8   1400
9   829
10  504
11  311
12  208
13  162
14  136
15  71
16  75
17  46
18  37
19  35
20  30
21  22
22  30
23  19
24  17
25  17
26  14
27  9
28  15
29  10
30  12
31  6
32  12
33  13
34  13
35  12
36  8
37  5
38  8
39  8
40  7
41  5
42  13
43  4
44  6
45  6
46  5
47  4
48  3
49  1
50  2
51  3
52  6
53  2
54  2
55  4
56  2
57  4
58  3
59  3
60  7
61  1
63  3
64  5
65  3
66  7
67  3
68  1
69  1
70  1
71  3
75  1
76  1
77  2
78  1
79  1
80  1
82  1
84  2
85  2
87  2
88  1
89  1
90  3
94  7
97  3
98  2
100 2
101 2
102 2
103 2
104 1
106 3
107 2
108 1
110 1
112 2
113 1
117 2
118 1
121 1
122 4
124 1
126 2
127 1
131 1
132 2
137 1
139 1
140 1
141 1
143 1
147 1
148 3
149 1
150 1
152 2
158 1
160 2
162 2
166 1
172 1
177 1
182 2
193 1
207 1
209 1
229 1
231 1
240 1
241 1
251 1
267 1
286 1
295 1
306 1
360 1
365 1
370 1
378 1
389 1
403 1
410 1
422 1
436 1
446 1
479 1
483 1
591 1
610 1
657 1
887 1

sample size: 1000000 sample size: 2000000 sample size: 3000000 sample size: 4000000 sample size: 5000000 sample size: 6000000 sample size: 7000000 sample size: 8000000 Command completed. Elapsed time: 0:00:41. Running peak memory: 13.199GB.
PID: 121903; Command: preseq; Return code: 0; Memory used: 0.004GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_yield.txt

preseq lc_extrap -v -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_yield.txt -B /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam (121934)

BAM_INPUT
TOTAL READS     = 8058033
DISTINCT READS  = 6.36302e+06
DISTINCT COUNTS = 151
MAX COUNT       = 887
COUNTS OF 1     = 5.09398e+06
MAX TERMS       = 60
OBSERVED COUNTS (888)
1   5093979
2   1002422
3   195431
4   44954
5   13707
6   5571
7   2629
8   1400
9   829
10  504
11  311
12  208
13  162
14  136
15  71
16  75
17  46
18  37
19  35
20  30
21  22
22  30
23  19
24  17
25  17
26  14
27  9
28  15
29  10
30  12
31  6
32  12
33  13
34  13
35  12
36  8
37  5
38  8
39  8
40  7
41  5
42  13
43  4
44  6
45  6
46  5
47  4
48  3
49  1
50  2
51  3
52  6
53  2
54  2
55  4
56  2
57  4
58  3
59  3
60  7
61  1
63  3
64  5
65  3
66  7
67  3
68  1
69  1
70  1
71  3
75  1
76  1
77  2
78  1
79  1
80  1
82  1
84  2
85  2
87  2
88  1
89  1
90  3
94  7
97  3
98  2
100 2
101 2
102 2
103 2
104 1
106 3
107 2
108 1
110 1
112 2
113 1
117 2
118 1
121 1
122 4
124 1
126 2
127 1
131 1
132 2
137 1
139 1
140 1
141 1
143 1
147 1
148 3
149 1
150 1
152 2
158 1
160 2
162 2
166 1
172 1
177 1
182 2
193 1
207 1
209 1
229 1
231 1
240 1
241 1
251 1
267 1
286 1
295 1
306 1
360 1
365 1
370 1
378 1
389 1
403 1
410 1
422 1
436 1
446 1
479 1
483 1
591 1
610 1
657 1
887 1

[ESTIMATING YIELD CURVE] [BOOTSTRAPPING HISTOGRAM] ..._............._............_................... .........................................__... ......... [COMPUTING CONFIDENCE INTERVALS] [WRITING OUTPUT] Command completed. Elapsed time: 0:00:42. Running peak memory: 13.199GB.
PID: 121934; Command: preseq; Return code: 0; Memory used: 0.004GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_counts.txt

echo '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_yield.txt '$(samtools view -c -F 4 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort.bam)' '$(samtools view -c -F 4 /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam) > /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_counts.txt (121958)


Command completed. Elapsed time: 0:00:18. Running peak memory: 13.199GB.
PID: 121958; Command: echo; Return code: 0; Memory used: 0.005GB

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_plot.pdf,/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_plot.png

Rscript /varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/PEPATAC.R preseq -i /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_yield.txt -r /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_counts.txt -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_plot (121977)

INFO: Found real counts for 305A - Total (M): 16.114986 Unique (M): 13.400248

Library complexity plot completed!

Command completed. Elapsed time: 0:00:07. Running peak memory: 13.199GB.
PID: 121977; Command: Rscript; Return code: 0; Memory used: 0.21GB

Library complexity QC_GRCm38_with_viruses_and_spikes/305A_preseq_plot.pdf Library complexity QC_GRCm38_with_viruses_and_spikes/305A_preseq_plot.png PEPATAC OBJ Missing stat 'Frac_exp_unique_at_10M'

grep -w '^10000000' /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_preseq_yield.txt | awk '{print $2}'

Frac_exp_unique_at_10M 0.7535 PEPATAC RES

Produce signal tracks (06-24 13:28:46) elapsed: 108.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact/305A_exact.bw,/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_smooth.bw

/varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/bamSitesToWig.py -i /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam -c /varidata/research/projects/immunograph/refgenie/alias/GRCm38_with_viruses_and_spikes/fasta/default/GRCm38_with_viruses_and_spikes.chrom.sizes -e /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact -b /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact/305A_shift.bed -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact/305A_exact.bw -w /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_smooth.bw -m atac -p 10 --variable-step --scale 17550751.0 (121993)

Cutting parallel chroms in half to accommodate two tracks.
Registering input file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam'
Temporary files will be stored in: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact/tmp_305A_sort_dedup_cuttrace_walycpqz'
Processing with 5 cores...
Reduce step (merge files)...
Merging 58 files into output file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes_exact/305A_exact.bw'
Merging 58 files into output file: '/varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_smooth.bw'

Command completed. Elapsed time: 0:04:55. Running peak memory: 13.199GB.
PID: 121993; Command: /varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/bamSitesToWig.py; Return code: 0; Memory used: 4.611GB

Calculate TSS enrichment (06-24 13:33:41) elapsed: 295.0 TIME

Target to produce: /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_TSS_enrichment.txt

/varidata/research/projects/immunograph/FASTQs/pepatac_results/pepatac/tools/pyTssEnrichment.py -a /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/aligned_GRCm38_with_viruses_and_spikes/305A_sort_dedup.bam -b /varidata/research/projects/immunograph/refgenie/alias/GRCm38_with_viruses_and_spikes/refgene_anno/default/GRCm38_with_viruses_and_spikes_TSS.bed -p ends -c 16 -z -v -s 6 -o /varidata/research/projects/immunograph/FASTQs/pepatac_results/new_processed/results_pipeline/305A/QC_GRCm38_with_viruses_and_spikes/305A_TSS_enrichment.txt (122818)

jpsmith5 commented 2 years ago

Hey @badusername21,

Have you tested this again with a more recent release? I'm sorry this slipped past me here. Typically that step is quite rapid, seconds as opposed to minutes. If you are still troubleshooting this late in the game give it a go with the most recent release and see if the issue persists. Happy to reopen the issue if the case.