Closed imcom closed 9 years ago
Anyone here mind this sever bug? Whenever a fail(tup) is called, the calling bolt stops and will not run again
It's a bit difficult to debug without seeing the code of the new bolt and the traceback of the error (if any).
My guess is that you are using SimpleBolt
(because you copied and modified the code from the exclamation_bolt
in the example), but you are trying to fail a tuple by yourself.
SimpleBolt
will automatically ack any tuple which didn't trigger an exception during the execution of process_tuple()
https://github.com/Yelp/pyleus/blob/develop/pyleus/storm/bolt.py#L177
The fail()
method does not stop the execution of process_tuple()
, but just sends a message to let Storm know that the tuple has been failed.
https://github.com/Yelp/pyleus/blob/develop/pyleus/storm/bolt.py#L70
This means you are saying to Storm that the tuple is both failed and acked. This is probably causing havoc in such a way that your bolt hangs indefinitely. If I am correct, inherits from Bolt
instead of SimpleBolt
and fail/ack tuple by yourself.
As I said at the beginning, this is just a guess, though.
turns out, the Pyleus bolts are very sensitive to receive/send buffer. Ever since I added below options to topology and using the numbers shown, I've not seen bolts hang in topology
topology.executor.receive.buffer.size: 16384 topology.executor.send.buffer.size: 16384 topology.transfer.buffer.size: 32
Though @poros 's thought is most likely true also and there may be a deeper issue down the path when fail and ack the same tup or fail then return from process_tup in SimpleBolt may also cause strange behaviour..
I am testing the exclamation_bolt but with a little modification like below:
I added a bolt after the existing one, and it will fail the tup depending on the work length
What I've been observing is that when the fail fired, bolt continued with the rest of code and afterwards it was just vanished or stopped. I would never see the second bolt running again.
here is the yaml definition:
Either I get the fail mechanism all wrong or this is a critical bug in pyleus ... please help me to work it out.
Thanks in advance