emqx / emqx-web-hook

EMQX Webhook Plugin
https://www.emqx.com
Apache License 2.0
79 stars 36 forks source link

Responding to webhook causes emqx to drop connections #131

Open Schweig opened 5 years ago

Schweig commented 5 years ago

I have an emqx broker set up and have the webhook plugin pointed at another server running node .js with express. I have about 630 devices connected to the emqx broker with a throughput of ~60 messages a second ( each device sending a message every 10 seconds). When I have have my node server send back a status with res.sendStatus(200) but if i send the I see that my connections drop from 630 to around 550. Would this be a timeout issue where my server is taking too long to respond? I'm unsure why the webhook plugin is making my emqx broker unstable

gilbertwong96 commented 5 years ago

In your case, there are 80 connections lost, it is very weird, could you please give me more details? For example, the emqx version and the log file.

Schweig commented 5 years ago

EMQX version is v3.0-rc.5. Which log file do you need?

lcofre commented 5 years ago

I am experiencing a similar problem with the webhook plugin running.

Currently I cannot trigger the behaviour at will, but the client is connected and at some point the emqx server disconnects the client. The webhook body of the device disconnecting is:

{
    "action": "client_disconnected",
    "client_id": "the_client_id",
    "username": "undefined",
    "reason": "keepalive_timeout"
}

and /opt/emqx/log/emqx.log.1 shows

2019-06-25 21:50:03.150 [error] 10.244.2.16:60802 ** State machine <0.14975.39> terminating
** Last event = {timeout,15000}
** When server state  = {idle,
                            {state,esockd_transport,#Port<0.628528>,
                                {{10,244,2,16},60802},
                                undefined,running,100,
                                {pstate,external,
                                    #Fun<emqx_connection.0.73284863>,
                                    {{10,244,2,16},60802},
                                    nossl,4,<<"MQTT">>,<<>>,false,
                                    <0.14975.39>,undefined,undefined,
                                    undefined,undefined,false,#{},1048576,
                                    undefined,undefined,undefined,false,true,
                                    true,false,ignore,
                                    #{msg => 0,pkt => 0},
                                    #{msg => 0,pkt => 0},
                                    false,undefined,false,
                                    #{from_client => 0,to_client => 0},
                                    emqx_connection,#{},undefined},
                                {none,
                                    #{max_packet_size => 1048576,
                                      version => 4}},
                                {emqx_gc,
                                    #{cnt => {1000,1000},
                                      oct => {1048576,1048576}}},
                                undefined,true,undefined,undefined,undefined,
                                undefined,15000}}
** Reason for termination = exit:idle_timeout
** Callback mode = [state_functions,state_enter]
** Stacktrace =
**  [{gen_statem,loop_event_result,9,[{file,"gen_statem.erl"},{line,1158}]},
     {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,249}]}]
2019-06-25 21:50:03.154 [error] 10.244.2.16:60802 crasher:
    initial call: emqx_connection:init/1
    pid: <0.14975.39>
    registered_name: []
    exception exit: idle_timeout
      in function  gen_statem:loop_event_result/9 (gen_statem.erl, line 1158)
    ancestors: [<0.2116.0>,<0.2115.0>,esockd_sup,<0.1968.0>]
    message_queue_len: 0
    messages: []
    links: [<0.2116.0>]
    dictionary: [{force_shutdown_policy,
                      #{max_heap_size => 838860800,message_queue_len => 8000}},
                  {'$logger_metadata$',#{peername => "10.244.2.16:60802"}}]
    trap_exit: false
    status: running
    heap_size: 10958
    stack_size: 27
    reductions: 169299
  neighbours:
<here new error>

The client is written in python and uses the paho-mqtt==1.4.0 package.

After that I see this webhook every 3 to 4 seconds of the device connecting

{
    "action": "client_connected",
    "client_id": "the_client_id",
    "username": "undefined",
    "keepalive": 60,
    "ipaddress": "10.244.2.16",
    "proto_ver": 4,
    "connected_at": 1561499917,
    "conn_ack": 0
}

and emqx server showing the same two errors on the log emqx.log

I am using emqx/emqx:v3.1.2 on kubernetes, a few instances on a cluster created with the kubernetes apiserver.

Thanks very much in advance for your help and hard work.

lukasportal commented 5 years ago

We are also having this issue. It seems like this plugin is making the requests synchronously. Maybe the author (or someone that knows Erlang :)) could have this plugin to send the http request asynchronously instead, so it doens't block any processing. We are experiencing clients being disconnected and/or any action like connecting/subscribing taking a good while to complete. All of these problems go away when we disable the plugin.

notjulian commented 4 years ago

We are also having this issue. It seems like this plugin is making the requests synchronously. Maybe the author (or someone that knows Erlang :)) could have this plugin to send the http request asynchronously instead, so it doens't block any processing. We are experiencing clients being disconnected and/or any action like connecting/subscribing taking a good while to complete. All of these problems go away when we disable the plugin.

Hi @gilbertwong96 any news on this? we have clusters (docker 4.1.0) and a webhook rule there is a delay sending/receiving messages (QOS2) with rule enabled

thanks

lcofre commented 4 years ago

It never really worked for us @gilbertwong96,

In the end, we disabled the webhook altogether and created a specific topic where the device itself would report to be connected, and set an LW&T to change that value to connected: false when disconnected.

Another option could be to listen to the connected and disconnected topics that EMQX provides. Although it's not a broker-agnostic implementation, the webhook was not either.

Thanks for any feedback you can give us on this one!

HJianBo commented 4 years ago

Hi guys, sorry for the late reply.

In the current implementation, The way it does in synchronous sending HTTP requests, But no matter how it won't have so many stability problems.

@lcofre the keepalive_timeout or idle_timeout should be caused by the client, that is, the client does not send a packet to Broker in a timely manner. It is not caused by webhook.

But, the asynchronously feature is a more preferable way, We plan to implement it in the next version.

Thanks for your feedback there.

nguyenduybinh commented 3 years ago

Hi guys, sorry for the late reply.

In the current implementation, The way it does in synchronous sending HTTP requests, But no matter how it won't have so many stability problems.

@lcofre the keepalive_timeout or idle_timeout should be caused by the client, that is, the client does not send a packet to Broker in a timely manner. It is not caused by webhook.

But, the asynchronously feature is a more preferable way, We plan to implement it in the next version.

Thanks for your feedback there.

@HJianBo Hi HJianBo, if emqx using synchronous to sending HTTP request, the performance of HTTP auth plugin and webhook is so bad. Have you benchmarked the EMQX HTTP auth and webhook with high concurrent connections yet? When i used emqx with HTTP auth and webhook, the performance is so bad, too many timeout, High CPU usage..

HJianBo commented 3 years ago

Indeed. We may release it in version 4.3.0.