daler / pybedtools

Python wrapper -- and more -- for BEDTools (bioinformatics tools for "genome arithmetic")
http://daler.github.io/pybedtools
Other
297 stars 103 forks source link

'generator raised StopIteration' error when running 'randomstats' with multiple processes #377

Open tparket opened 1 year ago

tparket commented 1 year ago

Hi,

First of all - thank you for your amazing work. pybedtools has been super useful for my research so far and I am very grateful.

I'm trying to run 'randomstats' with the following args:

results_dict = a.randomstats(b, iterations=1000, new=True, genome_fn=chromsizes_fn, processes=4, shuffle_kwargs={"chrom": True}, intersect_kwargs={"f": 1})

`--------------------------------------------------------------------------- StopIteration Traceback (most recent call last) ~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in parallel_apply(self, iterations, func, func_args, func_kwargs, processes, _orig_pool) 2932 for it in range(iterations): -> 2933 yield func(*func_args, **func_kwargs) 2934 raise StopIteration

~/.local/lib/python3.7/site-packages/pybedtools/stats.py in random_intersection(x, y, genome_fn, shuffle_kwargs, intersect_kwargs) 16 result = len(zz) ---> 17 helpers.close_or_delete(z, zz) 18 return result

~/.local/lib/python3.7/site-packages/pybedtools/helpers.py in close_or_delete(*args) 547 if hasattr(x.fn, "throw"): --> 548 x.fn.throw(StopIteration) 549

StopIteration:

The above exception was the direct cause of the following exception:

RuntimeError Traceback (most recent call last)

in ~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in randomstats(self, other, iterations, new, genome_fn, include_distribution, **kwargs) 2846 ) 2847 distribution = self._randomintersection( -> 2848 other, iterations=iterations, genome_fn=genome_fn, **kwargs 2849 ) 2850 ~/.local/lib/python3.7/site-packages/pybedtools/bedtool.py in _randomintersection(self, other, iterations, genome_fn, intersect_kwargs, _orig_pool, shuffle_kwargs, processes) 3038 ), 3039 processes=processes, -> 3040 _orig_pool=_orig_pool, 3041 ) 3042 ) RuntimeError: generator raised StopIteration` The thing is that when I remove the 'processes' argument the 'randomstats' works just fine, but everytime I try to run it with 'processes' (even with a value of 1), I get the aformentioned error. Other relevant data: - 'a' and 'b' are both bedtool objects generated from a df. A regular a.intersect(b, f=1) works perfectly. - 'chromsizes_fn' is the name of a genome file generated from a dict with: chromsizes_fn = pybedtools.chromsizes_to_file(chromsizes_dic, fn=temp_genome.name) I tried using both fn=False and fn=temp_genome.name - I tried to run it with both new=True and without it. It crashed on both tries. I would really appreciate your help. I'm planning to run 'randomstats' on a large number of files, with at least 1000 iterations for each time, and being able to use multiprocessing will make it feasible.
daler commented 1 year ago

Great to hear you find pybedtools useful.

Can you provide an example of the files you're using for a and b so I can test locally?

tparket commented 1 year ago

Thanks for getting back to me so soon. Please find the files* attached.

Archive.zip

*these are not the original files, but randomly generated intervals. Nevertheless, I’m getting the same errors.

igoronzy commented 11 months ago

I'm getting the same error. Have there been any updates to fix this issue?

bentyeh commented 11 months ago

Bumping this. Might be a Python versioning issue.

Prior to Python 3.7, the StopIteration raised by the generator (parallel_apply()) would have just signaled the end of the iteration. Starting in Python 3.7, a StopIteration raised by a generator is converted into a RuntimeError: see https://docs.python.org/3/library/exceptions.html#StopIteration

A workaround that seems to work for now is to:

  1. Comment out these 2 lines in the close_or_delete() function from helpers.py

    if hasattr(x.fn, "throw"):
        x.fn.throw(StopIteration)
  2. Replace the 2 instances (here and here) of raise StopIteration in BedTool.parallel_apply() with a simple return.

I'm happy to submit a pull request, but this may be part of a larger issue of dealing with Python versions in pybedtools.