Closed sajankedia closed 4 years ago
This happened on some machines, not all. I'm not sure why, the 'threads' parameter printed in "--log-level=DEBUG' is default '1', but still got about twenty threads.
how do you get the number of threads?
"ps M
OK i was looking more if you looked at the process tree. Can you share what kind of libraries is using your application ?
On Mon, Jun 8, 2020 at 10:59 AM sajankedia notifications@github.com wrote:
"ps M
" this will give the list of threads, if you want to know count: "ps M " | wc -l — You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/benoitc/gunicorn/issues/2342#issuecomment-640468923, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAADRIVGN3CFKZL7HTAORH3RVSR5JANCNFSM4NL4QO4A .
Forget my problem, I made a mistake in the debugging process. There are two gunicorn processes, the one with twenty threads was created by a Django config file, not command line.
I was wrong again. The gunicorn threads are created automatically with the same number of cpu cores, even though the 'threads' parameter in config is '1'. This happened at two processes, one with django, the other with falcon.
It's not just those unexpected threads, but some threads produce a lot of "sched_yield" calls after a "futex_wait", which makes the process occupy high CPU (about 120%) every few seconds.
[2020-06-09 17:19:05 +0800] [31755] [DEBUG] Current configuration:
config: None
bind: ['0.0.0.0:8888']
backlog: 2048
workers: 2
worker_class: sync
threads: 1
worker_connections: 1000
max_requests: 0
max_requests_jitter: 0
timeout: 30
graceful_timeout: 30
keepalive: 2
limit_request_line: 4094
limit_request_fields: 100
limit_request_field_size: 8190
reload: False
reload_engine: auto
reload_extra_files: []
spew: False
check_config: False
preload_app: False
sendfile: None
reuse_port: False
chdir: /home/xxx/xxx
daemon: False
raw_env: []
pidfile: None
worker_tmp_dir: None
user: 1000
group: 1000
umask: 0
initgroups: False
tmp_upload_dir: None
secure_scheme_headers: {'X-FORWARDED-PROTOCOL': 'ssl', 'X-FORWARDED-PROTO': 'https', 'X-FORWARDED-SSL': 'on'}
forwarded_allow_ips: ['127.0.0.1']
accesslog: None
disable_redirect_access_to_syslog: False
access_log_format: %(h)s %(l)s %(u)s %(t)s "%(r)s" %(s)s %(b)s "%(f)s" "%(a)s"
errorlog: -
loglevel: DEBUG
capture_output: False
logger_class: gunicorn.glogging.Logger
logconfig: None
logconfig_dict: {}
syslog_addr: udp://localhost:514
syslog: False
syslog_prefix: None
syslog_facility: user
enable_stdio_inheritance: False
statsd_host: None
dogstatsd_tags:
statsd_prefix:
proc_name: None
default_proc_name: test:app
pythonpath: None
paste: None
on_starting: <function OnStarting.on_starting at 0x7f26493fcea0>
on_reload: <function OnReload.on_reload at 0x7f2648f1e048>
when_ready: <function WhenReady.when_ready at 0x7f2648f1e158>
pre_fork: <function Prefork.pre_fork at 0x7f2648f1e268>
post_fork: <function Postfork.post_fork at 0x7f2648f1e378>
post_worker_init: <function PostWorkerInit.post_worker_init at 0x7f2648f1e488>
worker_int: <function WorkerInt.worker_int at 0x7f2648f1e598>
worker_abort: <function WorkerAbort.worker_abort at 0x7f2648f1e6a8>
pre_exec: <function PreExec.pre_exec at 0x7f2648f1e7b8>
pre_request: <function PreRequest.pre_request at 0x7f2648f1e8c8>
post_request: <function PostRequest.post_request at 0x7f2648f1e950>
child_exit: <function ChildExit.child_exit at 0x7f2648f1ea60>
worker_exit: <function WorkerExit.worker_exit at 0x7f2648f1eb70>
nworkers_changed: <function NumWorkersChanged.nworkers_changed at 0x7f2648f1ec80>
on_exit: <function OnExit.on_exit at 0x7f2648f1ed90>
proxy_protocol: False
proxy_allow_ips: ['127.0.0.1']
keyfile: None
certfile: None
ssl_version: 2
cert_reqs: 0
ca_certs: None
suppress_ragged_eofs: True
do_handshake_on_connect: False
ciphers: None
raw_paste_global_conf: []
strip_header_spaces: False
[2020-06-09 17:19:05 +0800] [31755] [INFO] Starting gunicorn 20.0.4
[2020-06-09 17:19:05 +0800] [31755] [DEBUG] Arbiter booted
[2020-06-09 17:19:05 +0800] [31755] [INFO] Listening at: http://0.0.0.0:8888 (31755)
[2020-06-09 17:19:05 +0800] [31755] [INFO] Using worker: sync
[2020-06-09 17:19:05 +0800] [31758] [INFO] Booting worker with pid: 31758
[2020-06-09 17:19:05 +0800] [31760] [INFO] Booting worker with pid: 31760
$ pstree -p 31758
gunicorn(31758)─┬─{gunicorn}(31762)
├─{gunicorn}(31763)
├─{gunicorn}(31764)
├─{gunicorn}(31765)
├─{gunicorn}(31766)
├─{gunicorn}(31767)
├─{gunicorn}(31768)
├─{gunicorn}(31769)
├─{gunicorn}(31771)
├─{gunicorn}(31772)
├─{gunicorn}(31773)
├─{gunicorn}(31774)
├─{gunicorn}(31775)
├─{gunicorn}(31776)
├─{gunicorn}(31777)
├─{gunicorn}(31778)
├─{gunicorn}(31779)
├─{gunicorn}(31780)
├─{gunicorn}(31781)
├─{gunicorn}(31782)
├─{gunicorn}(31783)
This process just has unexpected many threads, but the other process running for a while, produce a lot of sched_yield requests for no reason. However, the same process works fine in other machines, no threads, no high cpu usage.
[2020-05-27 09:06:05 +0000] [43] [INFO] Starting gunicorn 19.9.0
[2020-05-27 09:06:05 +0000] [43] [INFO] Listening at: http://0.0.0.0:5000 (43)
[2020-05-27 09:06:05 +0000] [43] [INFO] Using worker: sync
[2020-05-27 09:06:05 +0000] [46] [INFO] Booting worker with pid: 46
[2020-05-27 09:06:05 +0000] [48] [INFO] Booting worker with pid: 48
$ pstree -p 11304
gunicorn(11304)─┬─python(11356)
├─{gunicorn}(11372)
├─{gunicorn}(11373)
├─{gunicorn}(11374)
├─{gunicorn}(11375)
├─{gunicorn}(11376)
├─{gunicorn}(11377)
├─{gunicorn}(11378)
├─{gunicorn}(11380)
├─{gunicorn}(11381)
├─{gunicorn}(11382)
├─{gunicorn}(11383)
├─{gunicorn}(11384)
├─{gunicorn}(11385)
├─{gunicorn}(11386)
├─{gunicorn}(11387)
├─{gunicorn}(11388)
├─{gunicorn}(11389)
├─{gunicorn}(11390)
├─{gunicorn}(11391)
├─{gunicorn}(11392)
├─{gunicorn}(11393)
├─{gunicorn}(11394)
├─{gunicorn}(11395)
├─{gunicorn}(11396)
├─{gunicorn}(11397)
├─{gunicorn}(11398)
├─{gunicorn}(11399)
├─{gunicorn}(11400)
├─{gunicorn}(11401)
├─{gunicorn}(11402)
└─{gunicorn}(11403)
$ sudo perf trace -p 11304 -t 11403
LOST 133 events!
1191.397 ( 0.343 ms): gunicorn/11372 ... [continued]: sched_yield()) = 0
LOST 81 events!
1191.433 ( 0.379 ms): gunicorn/11372 ... [continued]: sched_yield()) = 0
LOST 49 events!
1191.456 ( 0.402 ms): gunicorn/11372 ... [continued]: sched_yield()) = 0
LOST 38 events!
LOST 100 events!
1624.002 (432.529 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.013 ( 0.002 ms): gunicorn/11372 futex(uaddr: 0x7f759ce77590, op: WAKE|PRIV, val: 1 ) = 0
1624.023 ( 0.004 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.039 ( 0.004 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.043 ( 0.002 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.055 ( 0.003 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.062 ( 0.005 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.067 ( 0.002 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.072 ( 0.002 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.087 ( 0.003 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.094 ( 0.005 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.099 ( 0.002 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.106 ( 0.005 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.117 ( 0.006 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.128 ( 0.006 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
1624.133 ( 0.002 ms): gunicorn/11372 sched_yield(arg0: 0, arg1: 0, arg2: 140143120315792, arg3: 94867378534512, arg4: 0, arg5: 94867378530512) = 0
I found out it's numpy who creates as many threads as the number of CPU cores automatically during a dot production. It's irrelevant with gunicorn.
The behavior of gunicorn threads and numpy threads are quite different, the former calls epoll, fchmod, etc, waiting for data, while the latter calls futex, sched_yield, do computation and synchronize.
@threefoldo I'm not using NumPy, still, 10-15 threads are created even though "--threads=1" parameter specified in Gunicorn command. "gunicorn -b :8005 -w 1 --threads=1 app_flask:app"
@benoitc sharing the logs by recreating the same issues:
[2020-06-11 12:52:13 +0000] [52] [INFO] Starting gunicorn 20.0.4
[2020-06-11 12:52:13 +0000] [52] [INFO] Listening at: http://0.0.0.0:8005 (52)
[2020-06-11 12:52:13 +0000] [52] [INFO] Using worker: sync
[2020-06-11 12:52:13 +0000] [55] [INFO] Booting worker with pid: 55
In FeatureStoreFactory : Initiating Cassandra Online Feature Store
2020-06-11 12:52:14,126 INFO cassandra_online_feature_store.py 19 __init__() - In init of CassandraOnlineFeatureStore
2020-06-11 12:52:14,126 INFO cassandra_online_feature_store.py 24 __init__() - Connecting to ['10.162.139.81', '10.162.139.106', '10.162.139.108']
^Z
[1]+ Stopped gunicorn -b :8005 -w 1 --threads=1 app_flask:app
root@ltr-deployment-7648f69999-8mfk9:/# bg
[1]+ gunicorn -b :8005 -w 1 --threads=1 app_flask:app &
root@ltr-deployment-7648f69999-8mfk9:/# ps M 55
PID TTY STAT TIME COMMAND
55 pts/0 - 0:02 /usr/local/bin/python /usr/local/bin/gunicorn -b :8005 -w 1 --threads=1 app_flask:app
- - Sl 0:01 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
root@ltr-deployment-7648f69999-8mfk9:/# ps M 55 | wc -l
11
root@ltr-deployment-7648f69999-8mfk9:/#
second example without --thread parameter:
4 S root 1 0 0 80 0 - 1073 wait 12:21 ? 00:00:00 sh -c gunicorn -b :8000 -w 1 app_flask:app
4 S root 6 1 0 80 0 - 25070 poll_s 12:21 ? 00:00:00 /usr/local/bin/python /usr/local/bin/gunicorn -b :8000 -w 1 app_flask:app
5 S root 9 6 0 80 0 - 374163 poll_s 12:21 ? 00:00:15 /usr/local/bin/python /usr/local/bin/gunicorn -b :8000 -w 1 app_flask:app
0 S root 48 26 0 80 0 - 3216 pipe_w 12:49 pts/0 00:00:00 grep unicorn
root@ltr-deployment-7648f69999-8mfk9:/# ps M 9
PID TTY STAT TIME COMMAND
9 ? - 0:16 /usr/local/bin/python /usr/local/bin/gunicorn -b :8000 -w 1 app_flask:app
- - Sl 0:11 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:03 -
- - Sl 0:00 -
- - Sl 0:00 -
- - Sl 0:00 -
root@ltr-deployment-7648f69999-8mfk9:/# ps M 9 | wc -l
15
root@ltr-deployment-7648f69999-8mfk9:/#
@sajankedia It's highly likely it's not gunicorn creating those threads, but some library that you're using. Find out what those threads are doing (a tool like py-spy
makes that trivial) and their traceback will probably give you a good idea where they come from.
(A quick search for "python cassandra" turns up https://github.com/datastax/python-driver which seems to create threads.)
I'll optimistically close this. If anyone does find a situation in which Gunicorn itself creates extra threads, please let us know with a new issue.
while running "gunicorn -b :8005 --workers=1 --threads=7 app:app", even though parameter for threads=7, still it'll create 15 threads for each worker.
If the number of workers is 2, then total 30 threads will be created.
If don't specify the parameter for threads still 15 threads are created for worker.
according to the official documentation the default value of thread is 1.["https://docs.gunicorn.org/en/stable/settings.html#threads"]