Error - pileup_model model

jainy commented 2 years ago

Hi!

I ran pb-CpG-tools on the haplotagged bam with methylation data. With —pileup_mode count, it worked fine generating the expected output. But when I tried with—pileup_model model, I get the following error.

Chunking regions for multiprocessing.
Running multiprocessing on 6,362 chunks.
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/threading.py", line 980, in _bootstrap_inner
    self.run()
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/process.py", line 323, in run
    self.terminate_broken(cause)
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/process.py", line 458, in terminate_broken
    work_item.future.set_exception(bpe)
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 549, in set_exception
    raise InvalidStateError('{}: {!r}'.format(self._state, self))
concurrent.futures._base.InvalidStateError: CANCELLED: <Future at 0x2b37b240d580 state=cancelled>
Traceback (most recent call last):
  File "/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1152, in <module>
    main()
  File "/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1140, in main
    bed_results = run_all_pileup_processing(regions_to_process, args.threads)
  File "pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 915, in run_all_pileup_processing
    bed_result = future.result()
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/locus/home/jathomas/miniconda3/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

I am also attaching the log file (bed-aligned_bam_to_cpg_scores.log) generated.

bed-aligned_bam_to_cpg_scores.log

Please let me know if you need any other info.

Thank you for your help!

Best, Jainy

dhspence commented 2 years ago

I am getting the same error using the model option. Any advice would be appreciated.

ctsa commented 2 years ago

Thanks for reporting this issue @jainy and @dhspence , There have been a number of recent changes relevant to error reporting, can you confirm if you're using the latest github version (c23686e)? If not, would you be able to reproduce the issue/logs on the latest version?

dhspence commented 2 years ago

Yes, I was using that version.

RhettRautsaw commented 2 years ago

Hi @ctsa, I just wanted to let you know that I am also getting this error and have the latest version.

Chunking regions for multiprocessing.
Running multiprocessing on 6,166 chunks.
Traceback (most recent call last):
  File "/home/r/rautsaw/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1152, in <module>
    main()
  File "/home/r/rautsaw/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 1140, in main
    bed_results = run_all_pileup_processing(regions_to_process, args.threads)
  File "/home/r/rautsaw/bin/pb-CpG-tools/aligned_bam_to_cpg_scores.py", line 915, in run_all_pileup_processing
    bed_result = future.result()
  File "/home/r/rautsaw/.conda/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 439, in result
    return self.__get_result()
  File "/home/r/rautsaw/.conda/envs/cpg/lib/python3.9/concurrent/futures/_base.py", line 391, in __get_result
    raise self._exception
concurrent.futures.process.BrokenProcessPool: A process in the process pool was terminated abruptly while the future was running or pending.

pb-cpg_error.log

I haven't tried the count method yet. I'll be playing with it more next week, so post on here if I figure out a solution.

Rhett

PengNi commented 2 years ago

I am also getting the error concurrent.futures.process.BrokenProcessPool occasionally when using the model option. I am using the latest version.

RhettRautsaw commented 2 years ago

Hello to everyone having this error!

I have often had problems with multiprocessing and concurrent.futures in Python. So I've forked this repository and attempted to fix this issue by using GNU-Parallel instead. To do this, I had to break the script into two pieces. You still just call one script, but it calls the other script internally for parallelization.

I've ran it and benchmarked it with the Example data listed on this page and it appears to be working! It also appears to even be faster! This wasn't my intention, but I'll take it! I'm currently running it with my actual data. So we will see how it goes, but I wanted to post on here in case other people want to test out what I've done. The installation is the same with the exception of needing to also install GNU-Parallel.

I've also made renamed the scripts and made it Breaking Bad themed...because why not I guess.

You can find my forked repository here: https://github.com/RhettRautsaw/Walter

Hope this helps!

Rhett Rautsaw

fritzsedlazeck commented 2 years ago

Hey @ctsa , do you have a workaround? We could use the version from @RhettRautsaw but I assume that will not be maintained from Pacbio.. Thanks Fritz

ctsa commented 2 years ago

All, Thanks for the reports and @RhettRautsaw for the workaround.

This has been a tricky issue since there's been little ability to reproduce it on our end and limited bandwidth for any major reengineering of this script. I can share some possible directions to look into:

Given the high memory use, this may be a memory exhaustion issue for some folks (these might be OOM kills?). Using limited parallelization should solve the problem if this is happening.
It's also possible there's some consequential version wiggle -- first of all please check that you're setting up the conda environment as specified below to ensure you're using the tested python and package versions:

https://github.com/PacificBiosciences/pb-CpG-tools#environment

...if you're already doing this and still having an issue, perhaps we need to control the patch release version as well? It is possible some variation between python 3.9.0 and 3.9.13 is having an impact here. The current conda env I see on internal test is python 3.9.12.

Finally there might be a stability issue tied to OS/arch? If it's something like linux version we might be able to go so far as dockerizing it to stabilize this (would also take care of possible issue 2 above) -- but at this point there's not much to indicate this could be a factor.

ctsa commented 2 years ago

@jainy @dhspence @RhettRautsaw @PengNi @fritzsedlazeck

I started experimenting with a compiled version of this logic. There is a pre-relase build for x86_64 linux here:

https://github.com/PacificBiosciences/pb-CpG-tools/releases/tag/binary-v2.0.0

I'd be curious for those of you who have run into issues with the python impl, if this addresses the problem.

PengNi commented 2 years ago

Hi all, FWIW, in my tests, the python version seems to work well when less threads are used. On a 40 processors+256GB RAM machine:

when using --threads 40, the python version will likely get this error.
when using --threads 30, I haven't got this error yet so far.

xiaoyunguo commented 1 year ago

I also have issues with using the model pileup mode, but with a different error message. Changing the number of threads didn't work for me. Pileup mode count works fine. The complain was index out of range, any help/suggestion will be greatly appreciated.

  saved_model.ParseFromString(file_content)
Exception thrown in worker process 24886: list index (0) out of range
/home/xiaoyun/miniconda3/envs/cpg/lib/python3.9/site-packages/tensorflow/python/saved_model/loader_impl.py:105: RuntimeWarning: Unexpected end-group tag: Not all data was converted
  saved_model.ParseFromString(file_content)

ctsa commented 1 year ago

@jainy @dhspence @RhettRautsaw @PengNi @fritzsedlazeck @xiaoyunguo and others on this thread,

The experimental binary version of the pileup tool has been further updated to a fully static and portable binary for 64-bit linux, and to near feature parity with the python script. Today we have just released this as an official replacement for the python script to help address the many types of issues reported above. You should also find that it is substantially faster and easier to use.

The new compiled binary v2.1.0 (or newer) release can be found here:

https://github.com/PacificBiosciences/pb-CpG-tools/releases/latest

If you're still interested, can you give this a try instead?

ctsa commented 1 year ago

Closing as no longer applicable. Please open a new issue if you're having any trouble with v2.

PacificBiosciences / pb-CpG-tools

Error - pileup_model model #26