cgarciae / pypeln

Concurrent data pipelines in Python >>>
https://cgarciae.github.io/pypeln
MIT License
1.55k stars 98 forks source link

[Bug] Program hanging when input is throttling by maxsize and exception raised in stage #110

Open chengjinluo opened 1 year ago

chengjinluo commented 1 year ago

Describe the bug Please refer to the code example.

After some investigation, I found all process workers exits successfully. But the initial worker thread which feed the input numbers into queue stuck on the block call of multiprocessing.Queue.put, and it can not get the exception raised by stopit.

Manually edit use_thread default value to False in pypeln.process.worker.start_workers will be a workaround (will start initial worker as a process instead of thread, and it will be terminated by signal).

It seems that pypeln.thread.Queue.IterableQueue has a different implementation, it will try put operation with timeout so that the thread is able to get the exception raised by stopit.

Minimal code to reproduce

import pypeln as pl

def proc(x):
    print(x, flush=True)
    if x == 10:
        assert False

if __name__ == '__main__':
    stage = pl.process.map(proc, list(range(100)), maxsize=4, workers=4)
    list(stage)

Expected behavior Program exits with exception.

Library Info os: ubuntu 22.04 python=3.10.12 pypeln=0.4.9

Screenshots If applicable, add screenshots to help explain your problem.

Additional context Add any other context about the problem here.