Kong / kong-python-pdk

Write Kong plugins in Python (Experimental)
Apache License 2.0
40 stars 14 forks source link

Delete message queues when they are not needed any more #122

Closed vtlkvl closed 1 year ago

vtlkvl commented 1 year ago

Memory leak is reported in https://github.com/Kong/kong-python-pdk/issues/120.

There are two scenarios when message queues are not deleted that leads to the memory leak:

  1. If plugin has no interaction with the PDK API, we immediately get ret message in PluginServer.handle_event and no further messaging is expected, but we still store the message queue in the dictionary which is never deleted afterwards.
  2. If any of plugins terminates request execution with kong.response.exit or kong.response.error, PluginServer._step will be called with the JSON payload and method name being called rather than with the ret message. After this no further messaging is expected, but we don't delete the message queue that leads to the memory leak

For testing I used steps described in https://github.com/vtlkvl/kong-python-pluginserver-memory-leak and ran siege with concurrency 50 and duration 30 seconds.

Results before the fix:

2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888] Partition of a set of 2938705 objects. Total size = 531759916 bytes., context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]  Index  Count   %     Size   % Cumulative  % Type, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      0 573226  20 357693024  67 357693024  67 collections.deque, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      1 573395  20 65391424  12 423084448  80 dict, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      2 859860  29 61909920  12 484994368  91 types.BuiltinMethodType, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      3 429930  15 20636640   4 505631008  95 threading.Condition, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      4 143337   5  9173568   2 514804576  97 types.MethodType, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      5 143381   5  8029336   2 522833912  98 _thread.lock, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      6 143296   5  6878208   1 529712120 100 queue.Queue, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      7  71637   2  2006100   0 531718220 100 int, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      8    212   0     8480   0 531726700 100 types.CellType, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888]      9     44   0     6336   0 531733036 100 function, context: ngx.timer
2023/06/17 15:38:16 [info] 1267#0: *153 [python:91888] <16 more rows. Type e.g. '_.more' to view.>, context: ngx.timer

Results after the fix:

2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132] Partition of a set of 648 objects. Total size = 74587 bytes., context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]  Index  Count   %     Size   % Cumulative  % Type, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      0     93  14    23048  31     23048  31 dict, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      1     31   5    19344  26     42392  57 collections.deque, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      2    117  18     4680   6     47072  63 types.CellType, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      3     58   9     4176   6     51248  69 types.BuiltinMethodType, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      4     26   4     3744   5     54992  74 function, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      5     33   5     2903   4     57895  78 str, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      6     49   8     2744   4     60639  81 _thread.lock, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      7      6   1     2600   3     63239  85 types.FrameType, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      8     31   5     2232   3     65471  88 weakref.ReferenceType, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132]      9     25   4     1960   3     67431  90 tuple, context: ngx.timer
2023/06/17 15:40:43 [info] 1267#0: *153 [python:174132] <13 more rows. Type e.g. '_.more' to view.>, context: ngx.timer

After the fix the count of collections.deque, queue.Queue and threading.Condition remains stable and low. No memory leak is observed. For what it is worth, the request throughout has increased by ~10% based on siege results.

CLAassistant commented 1 year ago

CLA assistant check
All committers have signed the CLA.

fffonion commented 1 year ago

@vtlkvl i think this PR lgtm, could you sign the CLA so I can proceed to merge? Thanks!

vtlkvl commented 1 year ago

I pushed additional minor change to check for the result type to be dict. It should be read to go.

fffonion commented 1 year ago

Merged! @vtlkvl Again thank you for the investigation and the fix!

fffonion commented 1 year ago

Released in 0.35

vtlkvl commented 1 year ago

Sweet!