Closed janpieper closed 6 years ago
While searching logs in our environments, I couldn't find such message. It will be hard to reproduce. Can you please give me any clue how to do it. What is order of "processing many of events per second"? It would be great if you could reproduce it in a test. State.should_receive = nil is set in few places and that is situation you are facing with. We've been processing with listeners something around 20k events today and didn't have any exceptions
We see this when pushing ~5.000 events in a short timespan to 11 persistent subscriptions, so in the order of 60.000 events in a few minutes flowing through the system.
This is also seemingly not bound to a specific event or anything, sometimes it happens - sometimes it doesn't.
We also saw a different behaviour that might be connected (not sure yet though): Sometimes Eventstore logs a heartbeat problem on one of our subscriptions which kind of smells like a full inbox in one of the subscription processes
The ArithmeticError
occurs after a command has failed with :aggregate_execution_failed
:
19:37:02.654 [warn] Example.ProcessManagers.ExampleProcessManager has requested to stop: :aggregate_execution_failed
19:37:02.655 [debug] Extreme event store subscription "process_manager_a" down due to: :shutdown
19:37:02.655 [warn] Subscription to EventStore is down. Will retry in 1000 ms.
19:37:02.655 [debug] Extreme event store subscription "projection_a" down due to: :shutdown
19:37:02.655 [debug] Extreme event store subscription "process_manager_b" down due to: :shutdown
19:37:02.655 [debug] Extreme event store subscription "projection_b" down due to: :shutdown
19:37:02.655 [debug] Extreme event store subscription "projection_c" down due to: :shutdown
19:37:02.655 [debug] Extreme event store subscription "process_manager_c" down due to: :shutdown
19:37:02.656 [debug] Extreme event store subscription "projection_d" down due to: :shutdown
19:37:02.656 [debug] Extreme event store subscription "projection_e" down due to: :shutdown
19:37:02.656 [debug] Extreme event store subscription "projection_f" down due to: :shutdown
19:37:02.656 [debug] Extreme event store subscription "projection_g" down due to: :shutdown
19:37:02.656 [info] Connecting Extreme to eventstore-service.dev:1113
19:37:02.656 [debug] Extreme event store subscription "projection_h" down due to: :shutdown
19:37:02.657 [debug] Locating aggregate process for `Example.Aggregates.Example` with UUID "example-1781b6ad-ddef-5e62-ac16-853eb6e49a5a"
19:37:02.659 [error] GenServer Commanded.EventStore.Adapters.Extreme.EventStore terminating
** (ArithmeticError) bad argument in arithmetic expression
(extreme) lib/extreme.ex:432: Extreme.slice_content/3
(extreme) lib/extreme.ex:425: Extreme.process_package/2
(extreme) lib/extreme.ex:407: Extreme.handle_info/2
(stdlib) gen_server.erl:616: :gen_server.try_dispatch/4
(stdlib) gen_server.erl:686: :gen_server.handle_msg/6
(stdlib) proc_lib.erl:247: :proc_lib.init_p_do_apply/3
Hi all,
@burmajam If I'm not mistaken, there is one code branch which is never executed and could be the reson why this occur.
If you check this line and look into code, you will see that such match {:unfinished_message, expected_message_length, data}
will never be generated in code by procees_message function. Also the 3rd function process_content
will take everything, BUT it resets state.should_receive
to nil, which is exactly what @janpieper described at the bottom of issue report.
2018-03-09 12:09:14.407 [error] We have unfinished message of length nil(2): <<36, 0>>
So there is still part of message but should_receive is nil.
This is issue where received tcp package was just part of protobuf message. But what is interesting here and odds are a bit low, is probably that extreme received partial message as TCP packages more than once in row, so after reset on line extreme.ex:453
it finally hit this buggy case.
We were able to reproduce the ArithmeticError
in our codebase and @timbuchwaldt created a PR which fixes the issue for us.
While processing many of events per second, we are randomly running into a
bad argument in arithmetic expression
error.We've used the
Logger.debug
from insideExtreme.slice_content
to see that data that is coming in: