Kong / kong-python-pdk

Write Kong plugins in Python (Experimental)
Apache License 2.0
45 stars 14 forks source link

Plugin memory usage increasing when calling `kong.response.error` #59

Open abhisheksharma2805 opened 2 years ago

abhisheksharma2805 commented 2 years ago

Kong: 2.7.0 python-pdk: 0.30

Plugin code:

import kong_pdk.pdk.kong as kong

Schema = (
    {"name": {"type": "string"}},
)

version = '0.1.0'
priority = 1000

class Plugin(object):
    def __init__(self, config):
        self.config = config

    def access(self, kong: kong.kong):
        return kong.response.error(401, "Auth failed")

I kept hitting the endpoint with this plugin enabled in a loop for almost an hour.

At the start memory usage was ~300MB. This kept increasing continuously and about an hour later memory usage reached around 800MB. This will not stop and eventually memory usage reaches the pod limit after which the plugin server restarts.

Screenshot 2022-04-28 at 1 34 14 PM

Recorded the memory usage of the plugin via top command

At initial stages (~70-80MB)

Screenshot 2022-04-28 at 12 51 24 PM

After approx 1 hour (more than 500MB)

Screenshot 2022-04-28 at 1 39 46 PM

In production I have seen this memory even reach 4GB (pod memory limit) after which the plugin restarts (memory dropped back to 300MB) Production graph (plugin was different than the above one but that is also returning kong.response.error for invalid requests):

Screenshot 2022-04-28 at 1 43 38 PM

Surely there is memory leak somewhere. Please resolve this ASAP, I am stuck on this for almost 2 months now. There was a fix that happened in 0.30 but that was only able to remove ghost threads but memory usage is still the issue.

abhisheksharma2805 commented 2 years ago

@fffonion Do we have any timeline when this issue will be resolved?

StarlightIbuki commented 1 year ago

It seems the channel is put into a message no matter if a return value is expected here, while here it skips the fetching of the message. A fair guess is that the message is accumulating when we repeatedly call kong.response.error. I'm writing a fix to this.

StarlightIbuki commented 11 months ago

I tried to diagnose this issue (attaching tracemalloc and using wrk to pressure test) and found this line of log:

2023/11/02 17:48:55 [error] 1287187#0: *1513 [kong] mp_rpc.lua:350 [test] 'InstanceId', client: 127.0.0.1, server: kong, request: "GET / HTTP/1.1", host: "127.0.0.1:8000"

And a long roll of plugin server log:

...
INFO - [17:49:04] instance #6890 of test started
INFO - [17:49:04] instance #6891 of test started
INFO - [17:49:04] instance #6892 of test started
INFO - [17:49:04] instance #6893 of test started
INFO - [17:49:04] instance #6894 of test started
INFO - [17:49:04] instance #6895 of test started
INFO - [17:49:04] instance #6896 of test started
INFO - [17:49:04] instance #6897 of test started
INFO - [17:49:04] instance #6898 of test started
INFO - [17:49:04] instance #6899 of test started

And debugging into it, I confirmed that for every request the gateway starts a new instance of the plugin.

StarlightIbuki commented 11 months ago

I've located the bug. Here we assume the __key__ is the plugins' uuid, but later updates changed the definition of __key__ so the plugin id never matches:

"plugins:test::aa509e2f-e716-4f3f-9fae-b4d9c6467cdc:::bbfb0079-a882-460e-a5b6-33867321bfeb", "b235ec81-7360-49f0-97bf-b4378dd18bfe"
StarlightIbuki commented 11 months ago

The code I'm testing with is from another version. And, given this issue is very old (Kong 2.7.0 and a Python PDK version matching it), I suppose it's a bug already fixed by https://github.com/Kong/kong-python-pdk/commit/ba3d5fae2b040804b1463585641919f76390feda. I cannot reproduce it with the latest version.