gearman / gearmand

http://gearman.org/
Other
740 stars 137 forks source link

Gearmand performance worse than Python gear in our production, is some configuration missing / not correct ? #393

Open pythonerdog opened 5 months ago

pythonerdog commented 5 months ago

In our production, Zuul as a gear client, and Jenkins gearman plugin as a gear worker (each jenkins node executor registered as a gear worker).

  1. gearmand version: 1.1.21 gearmand -t 0 --job-retries 1 --keepalive --keepalive-idle 600 --keepalive-count 9 --keepalive-interval 75 --verbose DEBUG -p 4730
  2. Python gear: 0.16.0 (https://pypi.org/project/gear/) GearServer(4730, host="0.0.0.0", statsd_prefix='zuul.geard', keepalive=True, tcp_keepidle=100, tcp_keepintvl=30, tcp_keepcnt=5)

Running on the same production env, kubernets With gearmand, 13000 tasks can be consumed per hour With Python gear, 24000 tasks can be consumed per hour

Is any comment for this ? Thanks very much

PS: next, we continue try enable gearmand multi-thread with parameter "-t" to further verification.

esabol commented 5 months ago

Well, I imagine -t 0 is the problem. It's not the default, and I wouldn't recommend that for anyone unless they are encountering some weirdness. You've hamstrung gearmand with that setting alone.

esabol commented 5 months ago

You're also not comparing the same keepalive settings. I doubt it matters much, but you should compare the two implementations with the same settings.

esabol commented 5 months ago

-verbose DEBUG is also doing an excessive amount of logging for a production environment. Either get rid of that option entirely or at least change it to --verbose INFO.

Kubernetes? Are you using the Docker image from https://hub.docker.com/r/artefactual/gearmand/ ?

pythonerdog commented 5 months ago

Thanks esabol Will try the test with same parameters. the keepalive may be a suspect and also the "-t 0" And I also notice the other 2 parameter "-b" and "-f" -b [ --backlog ] arg (=32) Number of backlog connections for listen. -f [ --file-descriptors ] arg Number of file descriptors to allow for the process (total connections will be slightly less). Default is max allowed for user.

What do you think about these 2 parameter with default value, and what's that potential impact ?

The docker image is built by ourself RUN wget https://github.com/gearman/gearmand/releases/download/1.1.21/gearmand-1.1.21.tar.gz && tar -zxvf gearmand-1.1.21.tar.gz && cd gearmand-1.1.21 && ./configure --with-boost-libdir=/usr/lib/x86_64-linux-gnu/ --enable-ssl && make && make install && gearmand --help && rm -rf gearmand-1.1.21 && rm -rf gearmand-1.1.21.tar.gz And gearmand process managed by supervisor

Thanks again for your quick support

SpamapS commented 5 months ago

-b will only matter if you have a lot of churn in workers/clients.

from man listen:

   The backlog argument defines the maximum length to which the queue of pending connections for sockfd may grow.  If  a connection
 request  arrives when the queue is full, the client may receive an error with an indication of ECONNREFUSED or, if the underlying
  protocol supports retransmission, the request may be ignored so that a later reattempt at connection succeeds.

For -f, that's likely not important unless you're seeing socket/file errors. Open file limits are mostly meant to stop runaway processes from eating up kernel resources. The only open files gearmand is going to use is sockets or a handful for things like logs or local sqlite files if you're using a background queue plugin. The user-level ulimit will be the highest they can go so this would only be to reduce it anyway.

esabol commented 5 months ago

-b will only matter if you have a lot of churn in workers/clients.

And it seems to me like that could be the case if one is doing performance testing with a trivial worker. So a higher value might be better in this arbitrary scenario?

pythonerdog commented 2 months ago

Hi @SpamapS and @esabol

After have several trial on our production CI, usually in busy developing time there are about over 20k gear tasks. One abnormal case is that, C gearmand run very slowly evet though only have few gear tasks

For example, client submit a task take about over 1s DEBUG 2024-09-07 15:36:05.331334 [ 9 ] Received GEARMAN_SUBMIT_JOB_HIGH -> libgearman-server/thread.cc:311 DEBUG 2024-09-07 15:36:06.809641 [ proc ] PACKET COMMAND: GEARMAN_SUBMIT_JOB_HIGH -> libgearman-server/server.cc:122
~~> 1s

After check the server debug log: DEBUG 2024-09-07 15:36:05.331334 [ 9 ] Received GEARMAN_SUBMIT_JOB_HIGH -> libgearman-server/thread.cc:311 DEBUG 2024-09-07 15:36:05.331339 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331341 [ 6 ] 10.175.51.166:23220 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151 DEBUG 2024-09-07 15:36:05.331342 [ proc ] Registering function: build:production/24r2/test-thor-nr-pdsch-ctrl-gcc-release-03 -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331348 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331350 [ proc ] Registering function: build:production/master/sct-thor-clang64-cpri-fdd-wb-fr1-nr-dl-01:middleweight -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331346 [ 9 ] 10.254.7.244:43488 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151 DEBUG 2024-09-07 15:36:05.331355 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331357 [ proc ] Registering function: build:production/23r1/sct-loki-gcc64-rtm-lte-dl-fdd-03 -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331363 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331364 [ proc ] Registering function: build:production/master/sct-thor-clang64-ecpri-tdd-fr1-nr-cpri-fdd-fr1-nr-cpri-fdd-lte:1exec -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331368 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331369 [ proc ] Registering function: build:production/23r3/sct-thor-clang64-cpri-fdd-nb-fr1-nr-ul-01:middleweight -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331370 [ 2 ] 10.175.51.100:15019 Ready POLLIN -> libgearman-server/gearmand_con.cc:138 DEBUG 2024-09-07 15:36:05.331374 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331374 [ 2 ] read 12 bytes -> libgearman-server/io.cc:810 DEBUG 2024-09-07 15:36:05.331376 [ proc ] Registering function: build:production/24r1/test-thor-nr-pucch-f1-fxp -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331376 [ 2 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117 DEBUG 2024-09-07 15:36:05.331378 [ 2 ] Received GEARMAN_GRAB_JOB_UNIQ -> libgearman-server/thread.cc:311 DEBUG 2024-09-07 15:36:05.331380 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331380 [ 2 ] 10.175.51.100:15019 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151 DEBUG 2024-09-07 15:36:05.331382 [ proc ] Registering function: build:production/master/sct-thor-clang64-cpri-fdd-fr1-cpri-tdd-fr1-nr-new-agent:middleweight -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331385 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:05.331386 [ proc ] Registering function: build:production/23r3/test-thor-clang64-sctlite:middleweight -> libgearman-server/server.cc:526 DEBUG 2024-09-07 15:36:05.331392 [ proc ] PACKET COMMAND: GEARMAN_CAN_DO -> libgearman-server/server.cc:122 ....... DEBUG 2024-09-07 15:36:06.807743 [ proc ] PACKET COMMAND: GEARMAN_PRE_SLEEP -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.807940 [ proc ] PACKET COMMAND: GEARMAN_GRAB_JOB_UNIQ -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.808553 [ proc ] PACKET COMMAND: GEARMAN_GRAB_JOB_UNIQ -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.808560 [ 10 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:633 DEBUG 2024-09-07 15:36:06.808577 [ 10 ] send() 12 bytes to peer -> libgearman-server/io.cc:407 DEBUG 2024-09-07 15:36:06.808581 [ 10 ] Sent NO_JOB -> libgearman-server/thread.cc:356 DEBUG 2024-09-07 15:36:06.808913 [ 10 ] 10.175.51.100:30472 Ready POLLIN -> libgearman-server/gearmand_con.cc:138 DEBUG 2024-09-07 15:36:06.808920 [ 10 ] read 12 bytes -> libgearman-server/io.cc:810 DEBUG 2024-09-07 15:36:06.808923 [ 10 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117 DEBUG 2024-09-07 15:36:06.808926 [ 10 ] Received GEARMAN_PRE_SLEEP -> libgearman-server/thread.cc:311 DEBUG 2024-09-07 15:36:06.808931 [ 10 ] 10.175.51.100:30472 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151 DEBUG 2024-09-07 15:36:06.808973 [ proc ] PACKET COMMAND: GEARMAN_PRE_SLEEP -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.808980 [ 10 ] Received RUN wakeup event -> libgearman-server/gearmand_thread.cc:633 DEBUG 2024-09-07 15:36:06.808995 [ 10 ] send() 12 bytes to peer -> libgearman-server/io.cc:407 DEBUG 2024-09-07 15:36:06.808997 [ 10 ] Sent NO_JOB -> libgearman-server/thread.cc:356 DEBUG 2024-09-07 15:36:06.809370 [ 10 ] 10.175.51.166:43854 Ready POLLIN -> libgearman-server/gearmand_con.cc:138 DEBUG 2024-09-07 15:36:06.809375 [ 10 ] read 12 bytes -> libgearman-server/io.cc:810 DEBUG 2024-09-07 15:36:06.809377 [ 10 ] Gear unpack -> libgearman-server/plugins/protocol/gear/protocol.cc:117 DEBUG 2024-09-07 15:36:06.809380 [ 10 ] Received GEARMAN_PRE_SLEEP -> libgearman-server/thread.cc:311 DEBUG 2024-09-07 15:36:06.809383 [ 10 ] 10.175.51.166:43854 Watching POLLIN -> libgearman-server/gearmand_thread.cc:151 DEBUG 2024-09-07 15:36:06.809480 [ proc ] PACKET COMMAND: GEARMAN_PRE_SLEEP -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.809628 [ proc ] PACKET COMMAND: GEARMAN_WORK_DATA -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.809636 [ proc ] PACKET COMMAND: GEARMAN_WORK_STATUS -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.809641 [ proc ] PACKET COMMAND: GEARMAN_SUBMIT_JOB_HIGH -> libgearman-server/server.cc:122 DEBUG 2024-09-07 15:36:06.809644 [ proc ] Received submission, function:build:production/master/sct-thor-abip-cpri-fdd-lte-ul-capacity-13 unique:95b3b2c512fc4229bb8773b7744ec842 with 2 arguments -> libgearman-server/server.cc:252

-- mostly to processing (CAN_DO && PRE_SLEEP)

workers register like this:

build:production/24r2/xxxxx-lte-newagent 0 0 1671 build:production/24r2/xxxxx-ul-capacity-15 0 0 1671 build:production/master/xxxxx-sctlite-dl 0 0 1529 build:production/24r3/xxxxx-ul-02:lightweight 0 0 1671 build:production/24r1/xxxxx-dl-s02-04 0 0 1671 build:production/24r2/xxxxx-release:middleweight 0 0 636 build:production/24r2/xxxxx-15:lightweight 0 0 1671

Seems it only has one thread "proc" to process the received packets and one by one. which caused many tasks (launching test) can't submit in time for it should get the gear server response (job handler)

Is there any special configuration or method to make it fast ?

I have an idea to submit task async in client which still use "submit_job" and handle the gear server response to get handler via a callback or similar function, I am not sure is it available and I will verify it in test env. Great appreciate you can give some comments.

Big thanks for your continuous support

esabol commented 2 months ago

@pythonerdog asked:

Is there any special configuration or method to make it fast ?

Beyond what we've already told you? Probably not. What command line options you are currently using to start gearmand?

20K tasks seems kind of crazy. Is that in a single job submission? If so, it seems reasonable to me that that would take 1.5 seconds.