emqx / deploy-emqx-to-aws-with-terraform

Apache License 2.0
10 stars 7 forks source link

Cannot reach more than 10000 connections while benchmarking #7

Open satheesh3 opened 1 month ago

satheesh3 commented 1 month ago

With all the default options and using 5.0.24 community version. The broker could not reach more than 10000 connections. Is it an issue with benchmarking tool ?

zmstone commented 1 month ago

Hi @satheesh3 Please check the tuning guide. https://docs.emqx.com/en/emqx/latest/performance/tune.html#performance-tuning-linux

Also, check if EMQX is logging anything.

satheesh3 commented 1 month ago

The performance tuning is already given in the repo right? image

I checked the ulimit and other network settings they were as per the suggestions

zmstone commented 1 month ago

did you set these ?

sysctl -w net.core.somaxconn=32768
sysctl -w net.ipv4.tcp_max_syn_backlog=16384
satheesh3 commented 1 month ago

Those configs are here https://github.com/emqx/deploy-emqx-to-aws-with-terraform/blob/720f962b6c2a9c91ffa859280ada5de41a5d08f6/modules/emqx5_cluster/scripts/init-replicant.sh and I see those values set.

satheesh3 commented 1 month ago
client(249): EXIT for {shutdown,closed}
50s pub_fail total=23 rate=0.45/sec
51s pub_overrun total=105570 rate=4381.01/sec
51s connect_succ total=751 rate=1.81/sec
51s connect_fail total=4 rate=0.08/sec
=CRASH REPORT==== 13-Sep-2024::04:11:13.486267 ===
  crasher:
    pid: <0.480.0>
    registered_name: []
    exception exit: noproc
      in function  emqtt:publish_via/3 (/emqtt_bench/_build/default/lib/emqtt/src/emqtt.erl, line 541)
      in call from emqtt_bench:publish/2 (/emqtt_bench/src/emqtt_bench.erl, line 919)
      in call from emqtt_bench:loop/5 (/emqtt_bench/src/emqtt_bench.erl, line 742)
    ancestors: []
    message_queue_len: 1
    messages: [{'EXIT',<0.482.0>,{shutdown,tcp_closed}}]
    links: []
    dictionary: [{rand_seed,{#{max => 288230376151711743,type => exsplus,
                                next => #Fun<rand.5.65977474>,
                                jump => #Fun<rand.3.65977474>},
                              [148199267938681517|148234516685588094]}},
                  {success_publish_count,3202},
                  {publish_begin_ts,-576460748281}]
    trap_exit: true
    status: running
    heap_size: 2586
    stack_size: 26
    reductions: 1253374
  neighbours:
zmstone commented 1 month ago

The shared log from client side shows a client got disconnected.

If EMQX side did not log anything, it could be that the ELB is limiting. Or if there is a NAT gateway between client and server, the NAT gateway might have limited number of ports to assign.

Maybe you can try this, run your bench again until it reaches the limit. start another bench (e.g. from EMQX node lolalhost, connect 127.0.0.1), if extra clients can connect, then it's the network to blame, otherwise try to connect with a static client ID and trace it to see why it gets disconnected.