Open wklken opened 9 months ago
the /metrics
response data after online for about two week
16098 api_request_duration_milliseconds_bucket
3674 api_request_duration_milliseconds_count
3674 api_request_duration_milliseconds_sum
4990 api_requests_total
7948 bapp_requests_total
12846 bandwidth
11 etcd_modify_indexes
1 etcd_reachable 1
1 http_requests_total 460517663
6 nginx_http_current_connections
1 nginx_metric_errors_total 0
1 node_info
24 shared_dict_capacity_bytes
24 shared_dict_free_space_bytes
Hey @wklken
I have noticed a similar issue and was able to remove it by forcing the consumer
to always be empty.
I use the following in my dockerfile to patch the issue.
# Patch https://github.com/apache/apisix/blob/3.7.0/apisix/plugins/prometheus/exporter.lua#L228 to avoid metrics per consumer.
RUN sed -i \
-e 's/ctx.consumer_name or ""/""/g' \
/usr/local/apisix/apisix/plugins/prometheus/exporter.lua
Hope this helps.
Thanks @boekkooi-lengoo
we have patched some settings to disable official metrics, which will cause the cpu 100% if too much records present.
currently only the bandwidth left.
I’m not certain whether the increasing memory usage is caused by the Prometheus plugin or not, nor do I understand why it is consuming so much memory.
@wklken have you solved your problem?
Not yet; we are waiting for the line (memory usage) to stabilize (about 4 weeks). If it does not show an increase, then perhaps the Prometheus plugin is the cause. Otherwise, we will need to investigate other plugins.
Any advices or tools for detecting the memory usage of each part of apisix?
@monkeyDluffy6017
please check if the memory leak happens in lua or in c
curl http://127.0.0.1:9180/apisix/admin/routes/test \\n-H 'X-API-KEY: edd1c9f034335f136f87ad84b625c8f1' -X PUT -d '\n{\n "uri": "/lua_memory_stat",\n "plugins": {\n "serverless-pre-function": {\n "phase": "rewrite",\n "functions" : ["return function() local mem = collectgarbage(\"count\") ngx.say(\"the memory allocated by lua is \", mem, \" kb\"); end"]\n }\n }\n}'
@monkeyDluffy6017
/lua_memory_stat
from the apisix of the pod: the memory allocated by lua is 291922.04101562 kbu can assign the issue to me , i will follow it
@wklken I think this will help you. #9545 nginx-lua-prometheus memory leak fix You can solve this problem by upgrading the version.
@wklken I think this will help you. https://github.com/apache/apisix/pull/9545 [https://github.com/[knyar/nginx-lua-prometheus/pull/151](https://github.com/knyar/nginx-lua-prometheus/pull/151)](nginx-lua-prometheus memory leak fix) You can solve this problem by upgrading the version.
thanks @theweakgod , I will check that.(apisix 3.2.1 use the nginx-lua-prometheus = 0.20220527
)
@wklken I think this will help you. https://github.com/apache/apisix/pull/9545 [https://github.com/[knyar/nginx-lua-prometheus/pull/151](https://github.com/knyar/nginx-lua-prometheus/pull/151)](nginx-lua-prometheus memory leak fix) You can solve this problem by upgrading the version.
thanks @theweakgod , I will check that.(apisix 3.2.1 use the
nginx-lua-prometheus = 0.20220527
)
It does seem to be one of the reasons. Is it possible to test this possibility (upgrade nginx-lua-prometheus
) and see if memory continues to grow? how londgwill this take?
@wklken我想这会对你有帮助。https://github.com/apache/apisix/pull/9545 https://github.com/knyar/nginx-lua-prometheus/pull/151你可以解决这个问题升级版本出现问题。
谢谢@theweakgod,我会检查一下。(apisix 3.2.1 使用
nginx-lua-prometheus = 0.20220527
)
Has the problem been solved?
@wklken 这个问题有什么进展不?
@wklken 这个问题有什么进展不?
不能在生产上验, 暂时没有同样的环境可以验证, 需要想办法复现生产一样的流量压一段时间(近期都不一定有时间处理; 验证后我会更新到这个issue)
It cannot be tested in production. We do not have the same environment to verify for the time being. We need to find a way to reproduce the same traffic pressure in production for a period of time (I may not have time to deal with it in the near future; I will update to this issue after verification).
@wklken Has the problem been solved?
@wklken 这个问题有什么进展不?
不能在生产上验, 暂时没有同样的环境可以验证, 需要想办法复现生产一样的流量压一段时间(近期都不一定有时间处理; 验证后我会更新到这个issue)
It cannot be tested in production. We do not have the same environment to verify for the time being. We need to find a way to reproduce the same traffic pressure in production for a period of time (I may not have time to deal with it in the near future; I will update to this issue after verification).
提供个线索,在上传图片跟上传文件接口特别容易出现这种现象
We rolling update another release, and the memory didn't increase after about 1 week.
@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.
@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.
👌
@theweakgod I still can't reproduce the memory increasing.
need huge metrics
but from the
We rolling update another release, and the memory didn't increase after about 1 week.
@theweakgod I still can't reproduce the memory increasing on my own cluster yet, will try again later.
Here's a revised version of the text with corrected grammar and improved clarity:
From the provided chart:
If the pull request bugfix: limit lookup table size is effective, the memory usage should not exceed 5.59 GB and should only show an increase for no more than 7 days.
Description
after deployed online for 2 weeks, we reschedule the pods, then got the chart below. from 3.7G to 6G.
We have no ext-plugins.
about 45000 routes
I have some suspicion that it is caused by the prometheus plugin, when all routes are all presented, the keys in prometheus is stable?
is there any tool to analysis this, while we don't have xray.
Environment
apisix version
): 3.2.1uname -a
):openresty -V
ornginx -V
): openresty/1.21.4.1curl http://127.0.0.1:9090/v1/server_info
):luarocks --version
):