epi2me-labs / wf-pore-c

Other
30 stars 9 forks source link

How to deal with too many pair files? #46

Closed gotouerina closed 2 months ago

gotouerina commented 8 months ago

Ask away!

ERROR ~ Error executing process > 'POREC:merge_pairs (1)'

Caused by: Process POREC:merge_pairs (1) terminated with an error exit status (1)

Command executed:

pass a quoted glob, pairtools will do its own globbing

pairtools merge -o "null.pairs.gz" --concatenate 'to_merge/*'

ERROR ~ Error executing process > 'POREC:merge_pairs (1)'

Caused by: Process POREC:merge_pairs (1) terminated with an error exit status (1)

Command executed:

pass a quoted glob, pairtools will do its own globbing

pairtools merge -o "null.pairs.gz" --concatenate 'to_merge/*'

Command exit status: 1

Command output: (empty)

Command error: Traceback (most recent call last): File "/home/epi2melabs/conda/bin/pairtools", line 11, in sys.exit(cli()) File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1157, in call return self.main(args, kwargs) File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1078, in main rv = self.invoke(ctx) File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1688, in invoke return _process_result(sub_ctx.command.invoke(sub_ctx)) File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 1434, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/epi2melabs/conda/lib/python3.8/site-packages/click/core.py", line 783, in invoke return __callback(args, kwargs) File "/home/epi2melabs/conda/lib/python3.8/site-packages/pairtools/cli/merge.py", line 134, in merge merge_py( File "/home/epi2melabs/conda/lib/python3.8/site-packages/pairtools/cli/merge.py", line 254, in merge_py subprocess.check_call(command, shell=True, stdout=outstream) File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 359, in check_call retcode = call(*popenargs, *kwargs) File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 340, in call with Popen(popenargs, kwargs) as p: File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 858, in init self._execute_child(args, executable, preexec_fn, close_fds, File "/home/epi2melabs/conda/lib/python3.8/subprocess.py", line 1720, in _execute_child raise child_exception_type(errno_num, err_msg, err_filename) OSError: [Errno 7] Argument list too long: '/bin/sh'

Work dir: /opt/synData/liusy/Amblysomus/Anchor/work/e8/a484105ce907a8664e52dd499250e2

Tip: when you have fixed the problem you can continue the execution adding the option -resume to the run command line WARN: Killing running tasks (2) -- Check '.nextflow.log' file for details

sarahjeeeze commented 8 months ago

Hi, thanks for letting us know. To help us recreate the error do you know how many pairtools files you have? We have seen this error before but thought we had fixed it, you could try to increase chunk size in the meantime.

JudithR commented 8 months ago

Hi, I encountered a similar problem today with merge_pairs_stats:

Pipeline was run with 20 threads and chunksize 100 (allthough the config file says 10000) - edit I have 98196 stats.txt files Input fastq for the pipeline are 9819509 reads.

Are the subsequent merge commands (bam, etc) likely to suffer the same problem? Is it possible to run these commands manually, may be using xargs?

Judith

Error message

Feb-01 18:40:47.610 [Task monitor] DEBUG nextflow.processor.TaskProcessor - Handling unexpected condition for
  task: name=POREC:merge_pairs_stats (1); work-dir=$WORKDIR/poreC/work/bf/be849ddf3bf1fc910d5d7cda16b2da
  error [nextflow.exception.ProcessFailedException]: Process `POREC:merge_pairs_stats (1)` terminated with an error exit status (126)
Feb-01 18:40:47.624 [Task monitor] ERROR nextflow.processor.TaskProcessor - Error executing process > 'POREC:merge_pairs_stats (1)'

Caused by:
  Process `POREC:merge_pairs_stats (1)` terminated with an error exit status (126)

Command executed:

  pairtools stats -o "null.pairs.stats.txt" --merge  to_merge/src*.stats.txt

Command exit status:
  126

Command output:
  (empty)

Command error:
  WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap.
  .command.sh: line 2: /home/epi2melabs/conda/bin/pairtools: Argument list too long

Work dir:
  /$WORKDIR/poreC/work/bf/be849ddf3bf1fc910d5d7cda16b2da
sarahjeeeze commented 7 months ago

I need to update the docs to make it clearer but 100 chunk size is really low, perhaps try the default 20000 and latest branch which has some computational improvements which should make it run much smoother.

gotouerina commented 7 months ago

Hi, We have about 3500 files

sarahjeeeze commented 7 months ago

Hi, Did you try increasing chunk size, you could use the default 20,000?

JudithR commented 7 months ago

Hi, I tried with 3 different chunk sizes, 100, 1000 and 10000. The largest chunk size works in terms of command length, but fails due to out-of-memory errors. The OOM is not the result of memory constraints of the server, but memory constraints in the task definition in the pipeline. To be fair, I'm running the nextflow pipeline on a system without scheduler. I adjusted the memory allowances and managed to finish the pipeline. Judith

gotouerina commented 7 months ago

I try again and I think 20000 chunk size works. It is better to keep your file numbers less than 1000

sarahjeeeze commented 7 months ago

Hi, thanks for the feedback, I was able to recreate the error and will add a fix.

sarahjeeeze commented 2 months ago

Closing as we fixed this in version 0.2.0