Open abhisheksharma2805 opened 2 years ago
@fffonion Do we have any timeline when this issue will be resolved?
I tried to diagnose this issue (attaching tracemalloc
and using wrk to pressure test) and found this line of log:
2023/11/02 17:48:55 [error] 1287187#0: *1513 [kong] mp_rpc.lua:350 [test] 'InstanceId', client: 127.0.0.1, server: kong, request: "GET / HTTP/1.1", host: "127.0.0.1:8000"
And a long roll of plugin server log:
...
INFO - [17:49:04] instance #6890 of test started
INFO - [17:49:04] instance #6891 of test started
INFO - [17:49:04] instance #6892 of test started
INFO - [17:49:04] instance #6893 of test started
INFO - [17:49:04] instance #6894 of test started
INFO - [17:49:04] instance #6895 of test started
INFO - [17:49:04] instance #6896 of test started
INFO - [17:49:04] instance #6897 of test started
INFO - [17:49:04] instance #6898 of test started
INFO - [17:49:04] instance #6899 of test started
And debugging into it, I confirmed that for every request the gateway starts a new instance of the plugin.
I've located the bug. Here we assume the __key__
is the plugins' uuid, but later updates changed the definition of __key__
so the plugin id never matches:
"plugins:test::aa509e2f-e716-4f3f-9fae-b4d9c6467cdc:::bbfb0079-a882-460e-a5b6-33867321bfeb", "b235ec81-7360-49f0-97bf-b4378dd18bfe"
The code I'm testing with is from another version. And, given this issue is very old (Kong 2.7.0 and a Python PDK version matching it), I suppose it's a bug already fixed by https://github.com/Kong/kong-python-pdk/commit/ba3d5fae2b040804b1463585641919f76390feda. I cannot reproduce it with the latest version.
Kong:
2.7.0
python-pdk:0.30
Plugin code:
I kept hitting the endpoint with this plugin enabled in a loop for almost an hour.
At the start memory usage was ~300MB. This kept increasing continuously and about an hour later memory usage reached around 800MB. This will not stop and eventually memory usage reaches the pod limit after which the plugin server restarts.
Recorded the memory usage of the plugin via
top
commandAt initial stages (~70-80MB)
After approx 1 hour (more than 500MB)
In production I have seen this memory even reach 4GB (pod memory limit) after which the plugin restarts (memory dropped back to 300MB) Production graph (plugin was different than the above one but that is also returning
kong.response.error
for invalid requests):Surely there is memory leak somewhere. Please resolve this ASAP, I am stuck on this for almost 2 months now. There was a fix that happened in
0.30
but that was only able to remove ghost threads but memory usage is still the issue.