databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 15 forks source link

error while running the test pipeline localy #285

Closed sunta3iouxos closed 1 month ago

sunta3iouxos commented 1 month ago

Hi all, Could you please help me with the following:

pipelines/pepatac.py --single-or-paired paired   --prealignment-index /scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/.   --genome hg38   --genome-index /scratch/tgeorgom/refgenie/alias/hg38/bowtie2_index/default/.   --chrom-sizes /scratch/tgeorgom/refgenie/alias/hg38/fasta/default/hg38.chrom.sizes   --sample-name test1   --input examples/data/test1_r1.fastq.gz   --input2 examples/data/test1_r2.fastq.gz   --genome-size hs   -O /scratch/tgeorgom/pepatac_test

The error is produced while parcing for fastQC the test1_R2_trim.fastq.gz, I assume with the following error:

> `FastQC report r2`    {'path': '/scratch/tgeorgom/pepatac_test/test1/fastqc/test1_R2_trim_fastqc.html', 'thumbnail_path': None, 'title': 'FastQC report r2', 'annotation': 'PEPATAC'} _RES_

### Prealignments (09-25 14:37:27) elapsed: 13.0 _TIME_

Traceback (most recent call last):
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 949, in main
    genome, genome_index = prealignment.split('=')
ValueError: not enough values to unpack (expected 2, got 1)
Starting cleanup: 0 files; 3 conditional files for cleanup

Conditional flag found: []

These conditional files were left in place:

- /scratch/tgeorgom/pepatac_test/test1/fastq/test1*.fastq
- /scratch/tgeorgom/pepatac_test/test1/fastq/*.fastq
- /scratch/tgeorgom/pepatac_test/test1/fastq/*.log

### Pipeline failed at:  (09-25 14:37:27) elapsed: 0.0 _TIME_

Total time: 0:00:17
Failure reason: Pipeline failure. See details above.
Exception ignored in atexit callback: <bound method PipelineManager._exit_handler of <pypiper.manager.PipelineManager object at 0x2b9021f81ea0>>
Traceback (most recent call last):
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2165, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.

both fastq files are there:

-rw-r--r-- 1 tgeorgom uniuser 612K Aug 23 12:34 test1_r1.fastq.gz
-rw-r--r-- 1 tgeorgom uniuser 622K Aug 23 12:34 test1_r2.fastq.gz

the trimmed have been created but are not gzipped:

total 4.3M
-rw-r--r-- 1 tgeorgom uniuser 2.2M Sep 25 14:37 test1_R1_trim.fastq
-rw-r--r-- 1 tgeorgom uniuser 2.2M Sep 25 14:37 test1_R2_trim.fastq
-rw-r--r-- 1 tgeorgom uniuser 2.5K Sep 25 14:37 test1-trimmed.log

and fastqc for the trimmed files is also there:

total 2.2M
-rw-r--r-- 1 tgeorgom uniuser 686K Sep 25 14:37 test1_R2_trim_fastqc.html
-rw-r--r-- 1 tgeorgom uniuser 360K Sep 25 14:37 test1_R2_trim_fastqc.zip
-rw-r--r-- 1 tgeorgom uniuser 700K Sep 25 14:37 test1_R1_trim_fastqc.html
-rw-r--r-- 1 tgeorgom uniuser 370K Sep 25 14:37 test1_R1_trim_fastqc.zip
sunta3iouxos commented 1 month ago

here is the whole output:

Using default schema: pipelines/pipestat_output_schema.yaml
No pipestat output schema was supplied to PipestatManager.
Initializing results file '/scratch/tgeorgom/pepatac_test/test1/stats.yaml'
### Pipeline run code and environment:

*          Command: `pipelines/pepatac.py --single-or-paired paired --prealignment-index /scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/. --genome hg38 --genome-index /scratch/tgeorgom/refgenie/alias/hg38/bowtie2_index/default/. --chrom-sizes /scratch/tgeorgom/refgenie/alias/hg38/fasta/default/hg38.chrom.sizes --sample-name test1 --input examples/data/test1_r1.fastq.gz --input2 examples/data/test1_r2.fastq.gz --genome-size hs -O /scratch/tgeorgom/pepatac_test`
*     Compute host: `cheops1`
*      Working dir: `/home/tgeorgom/pepatac`
*        Outfolder: `/scratch/tgeorgom/pepatac_test/test1/`
*         Log file: `/scratch/tgeorgom/pepatac_test/test1/PEPATAC_log.md`
*       Start time:  (09-25 14:37:12) elapsed: 2.0 _TIME_

### Version log:

*   Python version: `3.10.14`
*      Pypiper dir: `/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper`
*  Pypiper version: `0.14.2`
*     Pipeline dir: `/home/tgeorgom/pepatac/pipelines`
* Pipeline version: `0.11.3`
*    Pipeline hash: `82f0685e4d98d71d6d2fc5acfc9b995877c91648`
*  Pipeline branch: `* master`
*    Pipeline date: `2024-06-05 14:59:51 -0400`
*    Pipeline diff: `1 file changed, 21 insertions(+), 21 deletions(-)`

### Arguments passed to pipeline:

*           `TSS_name`:  `None`
*            `aligner`:  `bowtie2`
*          `anno_name`:  `None`
*          `blacklist`:  `None`
*        `chrom_sizes`:  `/scratch/tgeorgom/refgenie/alias/hg38/fasta/default/hg38.chrom.sizes`
*        `config_file`:  `pepatac.yaml`
*              `cores`:  `1`
*       `deduplicator`:  `samblaster`
*              `dirty`:  `False`
*             `extend`:  `250`
*              `fasta`:  `None`
*       `force_follow`:  `False`
*     `frip_ref_peaks`:  `None`
*    `genome_assembly`:  `hg38`
*       `genome_index`:  `/scratch/tgeorgom/refgenie/alias/hg38/bowtie2_index/default/.`
*        `genome_size`:  `hs`
*              `input`:  `['examples/data/test1_r1.fastq.gz']`
*             `input2`:  `['examples/data/test1_r2.fastq.gz']`
*               `keep`:  `False`
*               `lite`:  `False`
*             `logdev`:  `False`
*                `mem`:  `4000`
*              `motif`:  `False`
*          `new_start`:  `False`
*            `no_fifo`:  `False`
*           `no_scale`:  `False`
*      `output_parent`:  `/scratch/tgeorgom/pepatac_test`
*         `paired_end`:  `True`
*        `peak_caller`:  `macs3`
*          `peak_type`:  `fixed`
*      `pipeline_name`:  `None`
* `prealignment_index`:  `['/scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/.']`
* `prealignment_names`:  `[]`
*         `prioritize`:  `False`
*            `recover`:  `False`
*        `sample_name`:  `test1`
*        `search_file`:  `None`
*             `silent`:  `False`
*   `single_or_paired`:  `paired`
*             `skipqc`:  `False`
*                `sob`:  `False`
*           `testmode`:  `False`
*            `trimmer`:  `skewer`
*          `verbosity`:  `None`

### Initialized Pipestat Object:

* PipestatManager (default_pipeline_name)
* Backend: File
*  - results: /scratch/tgeorgom/pepatac_test/test1/stats.yaml
*  - status: /scratch/tgeorgom/pepatac_test/test1
* Multiple Pipelines Allowed: False
* Pipeline name: default_pipeline_name
* Pipeline type: sample
* Status Schema key: None
* Results formatter: default_formatter
* Results schema source: None
* Status schema source: None
* Records count: 2
* Sample name: DEFAULT_SAMPLE_NAME

----------------------------------------

Local input file: examples/data/test1_r1.fastq.gz
Local input file: examples/data/test1_r2.fastq.gz

> `Read_type`   paired  _RES_

> `Genome`      hg38    _RES_

### Merge/link and fastq conversion:  (09-25 14:37:13) elapsed: 0.0 _TIME_

Number of input file sets: 2
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz`

> `ln -sf /home/tgeorgom/pepatac/examples/data/test1_r1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz` (27394)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.002GB.
  PID: 27394;   Command: ln;    Return code: 0; Memory used: 0.002GB

Local input file: '/scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz'
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz`

> `ln -sf /home/tgeorgom/pepatac/examples/data/test1_r2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz` (27426)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.002GB.
  PID: 27426;   Command: ln;    Return code: 0; Memory used: 0.002GB

Local input file: '/scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz'
Found .fastq.gz file
Found .fq.gz file; no conversion necessary
Found .fastq.gz file
Found .fq.gz file; no conversion necessary
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz`,`/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz`

> `ln -sf /scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz` (27457)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.004GB.
  PID: 27457;   Command: ln;    Return code: 0; Memory used: 0.004GB

> `ln -sf /scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz` (27496)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.004GB.
  PID: 27496;   Command: ln;    Return code: 0; Memory used: 0.003GB

### Adapter trimming:  (09-25 14:37:14) elapsed: 2.0 _TIME_

Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq`

> `skewer -f sanger -t 1 -m pe -x /home/tgeorgom/pepatac/tools/NexteraPE-PE.fa --quiet -o /scratch/tgeorgom/pepatac_test/test1/fastq/test1 /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz` (27946)
<pre>
WARNING: Skipping mount /var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
.--. .-.
: .--': :.-.
`. `. : `'.' .--. .-..-..-. .--. .--.
_`, :: . `.' '_.': `; `; :' '_.': ..'
`.__.':_;:_;`.__.'`.__.__.'`.__.':_;
skewer v0.2.2 [April 4, 2016]
Parameters used:
-- 3' end adapter sequences in file (-x):       /home/tgeorgom/pepatac/tools/NexteraPE-PE.fa
A:      AGATGTGTATAAGAGACAG
B:      AGATGTGTATAAGAGACAG
C:      TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG
D:      CTGTCTCTTATACACATCTGACGCTGCCGACGA
E:      GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG
F:      CTGTCTCTTATACACATCTCCGAGCCCACGAGA
-- maximum error ratio allowed (-r):    0.100
-- maximum indel error ratio allowed (-d):      0.030
-- minimum read length allowed after trimming (-l):     18
-- file format (-f):            Sanger/Illumina 1.8+ FASTQ
Wed Sep 25 14:37:15 2024 >> started

Wed Sep 25 14:37:16 2024 >> done (0.987s)
12500 read pairs processed; of these:
    0 ( 0.00%) short read pairs filtered out after trimming by size control
    0 ( 0.00%) empty read pairs filtered out after trimming by size control
12500 (100.00%) read pairs available; of these:
 2848 (22.78%) trimmed read pairs available after processing
 9652 (77.22%) untrimmed read pairs available after processing
log has been saved to "/scratch/tgeorgom/pepatac_test/test1/fastq/test1-trimmed.log".
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.029GB.
  PID: 27946;   Command: skewer;        Return code: 0; Memory used: 0.029GB

> `mv /scratch/tgeorgom/pepatac_test/test1/fastq/test1-trimmed-pair1.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq` (27996)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.029GB.
  PID: 27996;   Command: mv;    Return code: 0; Memory used: 0.0GB

> `mv /scratch/tgeorgom/pepatac_test/test1/fastq/test1-trimmed-pair2.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_trim.fastq` (27997)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.029GB.
  PID: 27997;   Command: mv;    Return code: 0; Memory used: 0.0GB

Evaluating read trimming

> `Trimmed_reads`       25000   _RES_

> `Trim_loss_rate`      0.0     _RES_
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastqc/test1_R1_trim_fastqc.html`

> `fastqc --noextract --outdir /scratch/tgeorgom/pepatac_test/test1/fastqc /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq` (28097)
<pre>
WARNING: Skipping mount /var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
Started analysis of test1_R1_trim.fastq
Approx 5% complete for test1_R1_trim.fastq
Approx 15% complete for test1_R1_trim.fastq
Approx 20% complete for test1_R1_trim.fastq
Approx 30% complete for test1_R1_trim.fastq
Approx 40% complete for test1_R1_trim.fastq
Approx 45% complete for test1_R1_trim.fastq
Approx 55% complete for test1_R1_trim.fastq
Approx 60% complete for test1_R1_trim.fastq
Approx 70% complete for test1_R1_trim.fastq
Approx 80% complete for test1_R1_trim.fastq
Approx 85% complete for test1_R1_trim.fastq
Approx 95% complete for test1_R1_trim.fastq
Analysis complete for test1_R1_trim.fastq
</pre>
Command completed. Elapsed time: 0:00:07. Running peak memory: 0.056GB.
  PID: 28097;   Command: fastqc;        Return code: 0; Memory used: 0.056GB

> `FastQC report r1`    {'path': '/scratch/tgeorgom/pepatac_test/test1/fastqc/test1_R1_trim_fastqc.html', 'thumbnail_path': None, 'title': 'FastQC report r1', 'annotation': 'PEPATAC'} _RES_
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastqc/test1_R2_trim_fastqc.html`

> `fastqc --noextract --outdir /scratch/tgeorgom/pepatac_test/test1/fastqc /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_trim.fastq` (28210)
<pre>
WARNING: Skipping mount /var/apptainer/mnt/session/etc/resolv.conf [files]: /etc/resolv.conf doesn't exist in container
Started analysis of test1_R2_trim.fastq
Approx 5% complete for test1_R2_trim.fastq
Approx 15% complete for test1_R2_trim.fastq
Approx 20% complete for test1_R2_trim.fastq
Approx 30% complete for test1_R2_trim.fastq
Approx 40% complete for test1_R2_trim.fastq
Approx 45% complete for test1_R2_trim.fastq
Approx 55% complete for test1_R2_trim.fastq
Approx 60% complete for test1_R2_trim.fastq
Approx 70% complete for test1_R2_trim.fastq
Approx 80% complete for test1_R2_trim.fastq
Approx 85% complete for test1_R2_trim.fastq
Approx 95% complete for test1_R2_trim.fastq
Analysis complete for test1_R2_trim.fastq
</pre>
Command completed. Elapsed time: 0:00:04. Running peak memory: 0.064GB.
  PID: 28210;   Command: fastqc;        Return code: 0; Memory used: 0.064GB

> `FastQC report r2`    {'path': '/scratch/tgeorgom/pepatac_test/test1/fastqc/test1_R2_trim_fastqc.html', 'thumbnail_path': None, 'title': 'FastQC report r2', 'annotation': 'PEPATAC'} _RES_

### Prealignments (09-25 14:37:27) elapsed: 13.0 _TIME_

Traceback (most recent call last):
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 949, in main
    genome, genome_index = prealignment.split('=')
ValueError: not enough values to unpack (expected 2, got 1)
Starting cleanup: 0 files; 3 conditional files for cleanup

Conditional flag found: []

These conditional files were left in place:

- /scratch/tgeorgom/pepatac_test/test1/fastq/test1*.fastq
- /scratch/tgeorgom/pepatac_test/test1/fastq/*.fastq
- /scratch/tgeorgom/pepatac_test/test1/fastq/*.log

### Pipeline failed at:  (09-25 14:37:27) elapsed: 0.0 _TIME_

Total time: 0:00:17
Failure reason: Pipeline failure. See details above.
Exception ignored in atexit callback: <bound method PipelineManager._exit_handler of <pypiper.manager.PipelineManager object at 0x2b9021f81ea0>>
Traceback (most recent call last):
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2165, in _exit_handler
    self.fail_pipeline(Exception("Pipeline failure. See details above."))
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
Exception: Pipeline failure. See details above.
nsheff commented 1 month ago

The line is here: https://github.com/databio/pepatac/blob/82f0685e4d98d71d6d2fc5acfc9b995877c91648/pipelines/pepatac.py#L949

You are not passing the prealignment-index correctly.

The prealignment needs to have an = in it,

The docs say this:

--prealignment-index PREALIGNMENT_INDEX [PREALIGNMENT_INDEX ...] Space-delimited list of prealignment genome name and index files delimited by an equals sign to align to before primary alignment. e.g. rCRSd=/path/to/bowtie2_index/.

so maybe try: --prealignment-index rCRSd=/scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/.

sunta3iouxos commented 1 month ago

that fixed it but I got another error that I will state, if I do not find a solution.

nsheff commented 1 month ago

great!