esl / MongooseIM

MongooseIM is Erlang Solutions' robust, scalable and efficient XMPP server, aimed at large installations. Specifically designed for enterprise purposes, it is fault-tolerant and can utilise the resources of multiple clustered machines.
Other
1.64k stars 422 forks source link

GraphQL SSE crashes on large message stanza payload #4208

Open stsisnevich-hj opened 4 months ago

stsisnevich-hj commented 4 months ago

MongooseIM version: 6.2.0-44-g07a168ed1 Installed from: Helm Chart using mongooseim/mongooseim:latest image Erlang/OTP version: Erlang/OTP 26 [erts-14.1.1] [source] [64-bit] [smp:8:1] [ds:8:1:10] [async-threads:1] [jit:ns]

Context

I am currently running MongooseIM in a Kubernetes cluster (EKS AWS) using the official Helm chart from ESL/MongooseHelm to provide a chat feature for my application. The client chat binding utilizes WebSocket to interact with chat. To make the chat interactive, a system bot is running and interacting with the chat using c2s XML stream, consuming and responding to user messages. Recently, I started a migration to GraphQL API to unify the communication approach on both sides and simplify the client implementation by achieving segregation between the consuming of messages and actions in the chat.

Issue

I have encountered a problem since migrating to GraphQL which is related to the SSE stream in MongooseIM. The problem occurs when broadcasting a large message stanza payload (~2000 characters) via GraphQL SSE stream. This action leads to a crash in the SSE stream which leads to the closing of EventSource, but the user's session becomes stuck in a live state (cannot be kicked) for an indefinite period of time. Subsequent reconnection attempts do not resolve the issue, and the SSE does not work (no messages broadcasted) for the affected user until the server is restarted. Meanwhile, WS and c2s don't crash on such messages and continue to work when SSE stream is broken.

Additional Information:

Attachments:

mongooseim.json.1

{"when":"2024-01-11T12:09:41.025376+00:00","what":"mongoose_graphql_sse_handler_failed","text":{"id":"28","data":["{\"data\":{\"stanza\":{\"subscribeForMessages\":{\"timestamp\":\"2024-01-11T12:09:41.025198Z\",\"stanza_id\":\"a99f7e0ee96f429bbeabb6e5b06fdcc9\",\"stanza\":\"<message id='a99f7e0ee96f429bbeabb6e5b06fdcc9' to='main_thread_user+436656@conf.xmpp-dev' from='alice@xmpp-dev/1704-974935-740798-752907a8916cb029' type='groupchat'><body>test 1600 tttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttttt</body></message>\",\"sender\":","\"main_thread_user+436656@conf.xmpp-dev/Alice\"}}}}"]},"reason":"badarg","meta":{"pid":"<0.1362.0>","mfa":"{mongoose_graphql_sse_handler,handle_error,3}","line":"94","file":"/home/circleci/project/src/graphql/mongoose_graphql_sse_handler.erl"},"level":"error"}
{"when":"2024-01-11T12:09:41.715586+00:00","report":[["{initial_call,{cowboy_stream_h,request_process,['Argument__1','Argument__2','Argument__3']}}","{pid,<0.1362.0>}","{registered_name,[]}","{error_info,{error,{badrecord,ok},[{lasse_handler,terminate,3,...},{cowboy_loop,call,...},{cowboy_stream_h,...},{...}|...]}}","{ancestors,[<0.1361.0>,<0.1110.0>,<0.1107.0>,<0.1106.0>,ranch_sup,<0.250.0>]}","{message_queue_len,0}","{messages,[]}","{links,[<0.1361.0>]}","{dictionary,[{rand_seed,{#{type => exsss,next => #Fun<rand.0.65977474>,bits => 58,...},[29127266007832741|184983864797656587]}}]}","{trap_exit,true}","{status,running}","{heap_size,4185}","{stack_size,28}","{reductions,64816}"],""],"meta":{"pid":"<0.1362.0>","mfa":"{proc_lib,crash_report,4}","logger_formatter":{"title":"CRASH REPORT"},"line":"584","file":"proc_lib.erl","error_logger":{"type":"crash_report","tag":"error_report"},"domain":["otp","sasl"]},"level":"error","label":"{proc_lib,crash}"}
{"when":"2024-01-11T12:09:41.716895+00:00","unstructured_log":"Ranch listener 'ejabberd_cowboy_0.0.0.0_8088', connection process <0.1361.0>, stream 1 had its request process <0.1362.0> exit with reason {badrecord,ok} and stacktrace [{lasse_handler,terminate,3,[{file,\"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl\"},{line,106}]},{cowboy_loop,call,5,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl\"},{line,81}]},{cowboy_stream_h,execute,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,\"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl\"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,\"proc_lib.erl\"},{line,241}]}]\n","meta":{"pid":"<0.1361.0>","error_logger":{"tag":"error"}},"level":"error"}

error.log

2024-01-11 12:09:41.715 [error] <0.1362.0>@lasse_handler:terminate:106 CRASH REPORT Process <0.1362.0> with 0 neighbours crashed with reason: bad record ok in lasse_handler:terminate/3 line 106
2024-01-11 12:09:41.717 [error] <0.1362.0>@lasse_handler:terminate:106 Cowboy stream 1 with ranch listener 'ejabberd_cowboy_0.0.0.0_8088' and connection process <0.1361.0> had its request process exit with reason: bad record ok in lasse_handler:terminate/3 line 106

crash.log

2024-01-11 12:09:41 =CRASH REPORT====
  crasher:
    initial call: cowboy_stream_h:request_process/3
    pid: <0.1362.0>
    registered_name: []
    exception error: {{badrecord,ok},[{lasse_handler,terminate,3,[{file,"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl"},{line,106}]},{cowboy_loop,call,5,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl"},{line,81}]},{cowboy_stream_h,execute,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]}
    ancestors: [<0.1361.0>,<0.1110.0>,<0.1107.0>,<0.1106.0>,ranch_sup,<0.250.0>]
    message_queue_len: 0
    messages: []
    links: [<0.1361.0>]
    dictionary: [{rand_seed,{#{type => exsss,next => #Fun<rand.0.65977474>,bits => 58,uniform => #Fun<rand.1.65977474>,uniform_n => #Fun<rand.2.65977474>,jump => #Fun<rand.3.65977474>},[29127266007832741|184983864797656587]}}]
    trap_exit: true
    status: running
    heap_size: 4185
    stack_size: 28
    reductions: 64816
  neighbours:
2024-01-11 12:09:41 =ERROR REPORT====
Ranch listener 'ejabberd_cowboy_0.0.0.0_8088', connection process <0.1361.0>, stream 1 had its request process <0.1362.0> exit with reason {badrecord,ok} and stacktrace [{lasse_handler,terminate,3,[{file,"/home/circleci/project/_build/default/lib/lasse/src/lasse_handler.erl"},{line,106}]},{cowboy_loop,call,5,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_loop.erl"},{line,81}]},{cowboy_stream_h,execute,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,306}]},{cowboy_stream_h,request_process,3,[{file,"/home/circleci/project/_build/default/lib/cowboy/src/cowboy_stream_h.erl"},{line,295}]},{proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,241}]}]

Questions

Might something be wrong on the configuration level in my case? Are there any suggestions to mitigate this issue?

Thank you for your attention to this matter. Let me know if further clarification or details are needed.

gustawlippa commented 4 days ago

Thank you for reporting the issue. It seems that this is a problem with the Cowboy library, but we'll investigate, as we've already encountered some problems with SSE configuration in the past.