databio / pepatac

A modular, containerized pipeline for ATAC-seq data processing
http://pepatac.databio.org
BSD 2-Clause "Simplified" License
54 stars 14 forks source link

Unable to access jarfile PE #290

Open sunta3iouxos opened 2 weeks ago

sunta3iouxos commented 2 weeks ago

Hi, there, Could you please help me with this. It seems to be stuck on the trimming level, while calling java? activated the /scratch/tgeorgom/bulker/bulker_crates/databio/pepatac/1.0.12 image

This is the log file.

### Pipeline run code and environment:

*          Command: `pipelines/pepatac.py --single-or-paired paired --prealignment-index rCRSd=/scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/. --genome hg38 --genome-index /scratch/tgeorgom/refgenie/alias/hg38/bowtie2_index/default/. --chrom-sizes /scratch/tgeorgom/refgenie/alias/hg38/fasta/default/hg38.chrom.sizes --sample-name test1 --input examples/data/test1_r1.fastq.gz --input2 examples/data/test1_r2.fastq.gz --genome-size hs --trimmer trimmomatic -O /scratch/tgeorgom/pepatac_test`
*     Compute host: `cheops1`
*      Working dir: `/home/tgeorgom/pepatac`
*        Outfolder: `/scratch/tgeorgom/pepatac_test/test1/`
*         Log file: `/scratch/tgeorgom/pepatac_test/test1/PEPATAC_log.md`
*       Start time:  (10-07 11:29:19) elapsed: 1.0 _TIME_

### Version log:

*   Python version: `3.10.14`
*      Pypiper dir: `/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper`
*  Pypiper version: `0.14.2`
*     Pipeline dir: `/home/tgeorgom/pepatac/pipelines`
* Pipeline version: `0.11.3`
*    Pipeline hash: `82f0685e4d98d71d6d2fc5acfc9b995877c91648`
*  Pipeline branch: `* master`
*    Pipeline date: `2024-06-05 14:59:51 -0400`
*    Pipeline diff: `1 file changed, 21 insertions(+), 21 deletions(-)`

### Arguments passed to pipeline:

*           `TSS_name`:  `None`
*            `aligner`:  `bowtie2`
*          `anno_name`:  `None`
*          `blacklist`:  `None`
*        `chrom_sizes`:  `/scratch/tgeorgom/refgenie/alias/hg38/fasta/default/hg38.chrom.sizes`
*        `config_file`:  `pepatac.yaml`
*              `cores`:  `1`
*       `deduplicator`:  `samblaster`
*              `dirty`:  `False`
*             `extend`:  `250`
*              `fasta`:  `None`
*       `force_follow`:  `False`
*     `frip_ref_peaks`:  `None`
*    `genome_assembly`:  `hg38`
*       `genome_index`:  `/scratch/tgeorgom/refgenie/alias/hg38/bowtie2_index/default/.`
*        `genome_size`:  `hs`
*              `input`:  `['examples/data/test1_r1.fastq.gz']`
*             `input2`:  `['examples/data/test1_r2.fastq.gz']`
*               `keep`:  `False`
*               `lite`:  `False`
*             `logdev`:  `False`
*                `mem`:  `4000`
*              `motif`:  `False`
*          `new_start`:  `False`
*            `no_fifo`:  `False`
*           `no_scale`:  `False`
*      `output_parent`:  `/scratch/tgeorgom/pepatac_test`
*         `paired_end`:  `True`
*        `peak_caller`:  `macs3`
*          `peak_type`:  `fixed`
*      `pipeline_name`:  `None`
* `prealignment_index`:  `['rCRSd=/scratch/tgeorgom/refgenie/alias/rCRSd/bowtie2_index/default/.']`
* `prealignment_names`:  `[]`
*         `prioritize`:  `False`
*            `recover`:  `False`
*        `sample_name`:  `test1`
*        `search_file`:  `None`
*             `silent`:  `False`
*   `single_or_paired`:  `paired`
*             `skipqc`:  `False`
*                `sob`:  `False`
*           `testmode`:  `False`
*            `trimmer`:  `trimmomatic`
*          `verbosity`:  `None`

### Initialized Pipestat Object:

* PipestatManager (default_pipeline_name)
* Backend: File
*  - results: /scratch/tgeorgom/pepatac_test/test1/stats.yaml
*  - status: /scratch/tgeorgom/pepatac_test/test1
* Multiple Pipelines Allowed: False
* Pipeline name: default_pipeline_name
* Pipeline type: sample
* Status Schema key: None
* Results formatter: default_formatter
* Results schema source: None
* Status schema source: None
* Records count: 2
* Sample name: DEFAULT_SAMPLE_NAME

----------------------------------------

Local input file: examples/data/test1_r1.fastq.gz
Local input file: examples/data/test1_r2.fastq.gz

> `Read_type`   paired  _RES_

> `Genome`      hg38    _RES_
### Merge/link and fastq conversion:  (10-07 11:29:19) elapsed: 0.0 _TIME_

Number of input file sets: 2
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz`

> `ln -sf /home/tgeorgom/pepatac/examples/data/test1_r1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz` (12730)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.003GB.
  PID: 12730;   Command: ln;    Return code: 0; Memory used: 0.003GB

Local input file: '/scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz'
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz`

> `ln -sf /home/tgeorgom/pepatac/examples/data/test1_r2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz` (12797)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.009GB.
  PID: 12797;   Command: ln;    Return code: 0; Memory used: 0.009GB

Local input file: '/scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz'
Found .fastq.gz file
Found .fq.gz file; no conversion necessary
Found .fastq.gz file
Found .fq.gz file; no conversion necessary
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz`,`/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz`

> `ln -sf /scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz` (12853)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.009GB.
  PID: 12853;   Command: ln;    Return code: 0; Memory used: 0.009GB

> `ln -sf /scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz` (12902)
<pre>
</pre>
Command completed. Elapsed time: 0:00:00. Running peak memory: 0.029GB.
  PID: 12902;   Command: ln;    Return code: 0; Memory used: 0.029GB

### Adapter trimming:  (10-07 11:29:21) elapsed: 2.0 _TIME_

trimmomatic local_input_files: ['/scratch/tgeorgom/pepatac_test/test1/raw/test1_R1.fastq.gz', '/scratch/tgeorgom/pepatac_test/test1/raw/test1_R2.fastq.gz']
Target to produce: `/scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq`

> `java -Xmx4000M -jar ${TRIMMOMATIC} PE -threads 1 /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_unpaired.fq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_trim.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_unpaired.fq ILLUMINACLIP:/home/tgeorgom/pepatac/tools/NexteraPE-PE.fa:2:30:10` (13461)
<pre>
Error: Unable to access jarfile PE
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.031GB.
  PID: 13461;   Command: java;  Return code: 1; Memory used: 0.031GB

Starting cleanup: 0 files; 1 conditional files for cleanup

Conditional flag found: []

These conditional files were left in place:

- /scratch/tgeorgom/pepatac_test/test1/fastq/test1*.fastq

### Pipeline failed at:  (10-07 11:29:22) elapsed: 1.0 _TIME_

Total time: 0:00:04
Failure reason: Subprocess returned nonzero result. Check above output for details
Traceback (most recent call last):
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 2779, in <module>
    sys.exit(main())
  File "/home/tgeorgom/pepatac/pipelines/pepatac.py", line 914, in main
    pm.run(trim_cmd, trimmed_fastq, follow=check_trim)
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 1049, in run
    process_return_code, local_maxmem = self.callprint(
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 1316, in callprint
    self._triage_error(SubprocessError(msg), nofail)
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2539, in _triage_error
    self.fail_pipeline(e)
  File "/home/tgeorgom/miniforge3/lib/python3.10/site-packages/pypiper/manager.py", line 2009, in fail_pipeline
    raise exc
pypiper.exceptions.SubprocessError: Subprocess returned nonzero result. Check above output for details

for more contex this is my /bulker_config.yaml file

bulker:
  volumes:
   - $HOME
   - /scratch/tgeorgom/
  envvars:
   - DISPLAY
  registry_url: http://hub.bulker.io/
  shell_path: ${SHELL}
  shell_rc: $HOME/.bashrc
  rcfile: templates/start.sh
  rcfile_strict: templates/start_strict.sh
  default_crate_folder: /scratch/tgeorgom/bulker/bulker_crates
  singularity_image_folder: /scratch/tgeorgom/bulker/simages
  container_engine: singularity
  default_namespace: bulker
  executable_template: templates/singularity_executable.jinja2
  shell_template: templates/singularity_shell.jinja2
  build_template: templates/singularity_build.jinja2
  crates:
    bulker:
      demo:
        default: /scratch/tgeorgom/bulker/bulker_crates/bulker/demo/default
      pi:
        default: /scratch/tgeorgom/bulker/bulker_crates/bulker/pi/default
      alpine:
        default: /scratch/tgeorgom/bulker/bulker_crates/bulker/alpine/default
      coreutils:
        default: /scratch/tgeorgom/bulker/bulker_crates/bulker/coreutils/default
    databio:
      pepatac:
        1.0.7: /scratch/tgeorgom/bulker/bulker_crates/databio/pepatac/1.0.7
        1.0.10: /scratch/tgeorgom/bulker/bulker_crates/databio/pepatac/1.0.10
        1.0.12: /scratch/tgeorgom/bulker/bulker_crates/databio/pepatac/1.0.12
donaldcampbelljr commented 2 weeks ago

Hi,

java -Xmx4000M -jar ${TRIMMOMATIC} PE may offer some clues.

Does setting the env variable TRIMMOMATIC to the trimmomatic jar file solve the issue?

Could you also make sure the jar file is executable?

sunta3iouxos commented 2 weeks ago

Thank you for the quick response. After activating the pepatak environment and looking for echo ${TRIMMOMATIC} I get empty response. Maybe it is not properly set in the bulker? Anyway this is what I did: activated the environmet

TRIMMOMATIC=/scratch/tgeorgom/bulker/bulker_crates/databio/pepatac/1.0.12/trimmomatic

and got the same error:

> `java -Xmx4000M -jar ${TRIMMOMATIC} PE -threads 1 /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2.fastq.gz /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_trim.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R1_unpaired.fq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_trim.fastq /scratch/tgeorgom/pepatac_test/test1/fastq/test1_R2_unpaired.fq ILLUMINACLIP:/home/tgeorgom/pepatac/tools/NexteraPE-PE.fa:2:30:10` (25453)
<pre>
Error: Unable to access jarfile PE
</pre>
Command completed. Elapsed time: 0:00:01. Running peak memory: 0.031GB.
  PID: 25453;   Command: java;  Return code: 1; Memory used: 0.031GB
donaldcampbelljr commented 2 weeks ago

Ok, I see. For a short term solution, you'll need to download the trimmomatic jar file and set your environment variable to the location of the jar file. Not the directory within the bulker crate like you have above. Apologies that this is confusing.

If trimmomatic continues to give you issues, you could also attempt to use skewer. I believe it is faster than trimmomatic.

donaldcampbelljr commented 2 weeks ago

I just realized that it's not populating the variable in your second attempt, so the solution above may not work.

sunta3iouxos commented 2 weeks ago

skewer works. Could you please state the options you are using for skewer? I had some issues with skewer in the past and had to do with quality trimming and a couple of other things. Overall, was not that reliable, although very fast. I use fastp without issues, with those options "--trim_poly_g --trim_poly_x -Q -L --correction".

donaldcampbelljr commented 2 weeks ago

Yes and I believe it should default to skewer without adjusting the pipeline if not other trimmer is set.

Checking the pipeline interface:

{% if sample.trimmer is defined %} --trimmer { sample.trimmer } {% else %} --trimmer "skewer" {% endif %}

From my recent tutorial run using the native install (which defaults to skewer): https://pepatac.databio.org/en/latest/detailed-install/

I see this skewer command from my PEPATAC_log.md

### Adapter trimming:  (10-07 11:55:35) elapsed: 0.0 _TIME_

Target to produce: `/home/drc/PEPATAC_OCT_2024/processed/results_pipeline/tutorial1/fastq/tutorial1_R1_trim.fastq`  

> `skewer -f sanger -t 4 -m pe -x /home/drc/PEPATAC_OCT_2024//tools/pepatac/tools/NexteraPE-PE.fa --quiet -o /home/drc/PEPATAC_OCT_2024/processed/results_pipeline/tutorial1/fastq/tutorial1 /home/drc/PEPATAC_OCT_2024/processed/results_pipeline/tutorial1/fastq/tutorial1_R1.fastq.gz /home/drc/PEPATAC_OCT_2024/processed/results_pipeline/tutorial1/fastq/tutorial1_R2.fastq.gz` (595646)

Is this helpful?

nsheff commented 2 weeks ago

Isn't skewer the default trimmer?

sunta3iouxos commented 1 week ago

Isn't skewer the default trimmer?

it is but I prefer trimmomatic!

nsheff commented 1 week ago

You can use trimmomatic if you want, but because it's Java, it's not working with Bulker at the moment, so you basically have to put the JAR somewhere and point that ENV var to it, so it will use the local one instead of from within bulker. sorry for the inconvenience, I think I didn't quite finish getting bulker working with java tools, unfortunately.