apache / apisix-ingress-controller

APISIX Ingress Controller for Kubernetes
https://apisix.apache.org/
Apache License 2.0
1k stars 341 forks source link

bug: [memory leak] Every time apisix is restarted, the apisix-ingress-controller memory will grow #2226

Closed twotwo7 closed 3 weeks ago

twotwo7 commented 5 months ago

Current Behavior

apisix-ingress-controller version : 1.6.1

Restart apisix for the first time, apisix-ingress-controller memory increased from 64Mi -> 75Mi

ME:XKE/master-01 ~ o kubectl top pod |grep apisix
apisix-84744d79db-cgpmq 4m 126Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 64Mi

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 16:31:43 +08 2024
apisix-84744d79db-cgpmq 3m 126Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 64Mi

ME:XKE/master-01 ~ o date && kubectl delete pod apisix-84744d79db-cgpmq
Mon Apr 29 16:32:00 +08 2024
pod "apisix-84744d79db-cgpmq" deleted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 16:33:09 +08 2024
apisix-ingress-controller-6497bc9c96-dttdz 6m 69Mi

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 16:34:39 +08 2024
apisix-84744d79db-l2khb 3m 74Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 75Mi

Restart apisix for the fifth time, apisix-ingress-controller memory increased 104Mi -> 124Mi

ME:XKE/master-01 ~ o date && kubectl delete pod apisix-84744d79db-4xxbv
Mon Apr 29 16:43:01 +08 2024
pod "apisix-84744d79db-4xxbv" deleted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 16:46:08 +08 2024
apisix-84744d79db-7l7tt 3m 75Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 124Mi

Restart apisix for the 10th time, apisix-ingress-controller memory increased 171Mi -> 177Mi

ME:XKE/master-01 ~ o date && kubectl delete pod apisix-84744d79db-bw7xh
Mon Apr 29 17:04:33 +08 2024
pod "apisix-84744d79db-bw7xh" deleted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 17:06:50 +08 2024
apisix-84744d79db-zx5nw 4m 67Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 177Mi

Restart apisix for the 12th time, apisix-ingress-controller memory increased from 183Mi -> 192Mi

date && kubectl delete pod apisix-84744d79db-wbsts
Mon Apr 29 17:10:36 +08 2024
pod "apisix-84744d79db-wbsts" deleted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 17:12:30 +08 2024
apisix-84744d79db-mh57g 23m 78Mi
apisix-ingress-controller-6497bc9c96-dttdz 7m 192Mi

Restart apisix for the 13th time, apisix-ingress-controller memory does not change much 192Mi -> 194Mi

date && kubectl delete pod apisix-84744d79db-mh57g
Mon Apr 29 17:13:02 +08 2024
pod "apisix-84744d79db-mh57g" deleted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 17:15:02 +08 2024
apisix-84744d79db-7zxtt 5m 76Mi
apisix-ingress-controller-6497bc9c96-dttdz 101m 194Mi

At this time, I found that after restarting apisix, the memory of apisix-ingress-controller did not increase much. But the log keeps reporting errors when accessing the apiserver. It should be that the list does not have data, so the memory does not increase.

After a while, apisix-ingress-controller restarted

ME:XKE/master-01 ~ o date && kubectl top pod |grep apisix
Mon Apr 29 17:30:47 +08 2024
apisix-84744d79db-p5thm 3m 84Mi
apisix-ingress-controller-6497bc9c96-dttdz 2m 32Mi

ME:XKE/master-01 ~ o kubectl get pod |grep apisix
apisix-84744d79db-p5thm 1/1 Running 0 12m
apisix-ingress-controller-6497bc9c96-dttdz 1/1 Running 1 11d

Expected Behavior

apisix-ingress-controller memory will not grow when apisix restart

Error Logs

pprof file

apisix-ingress-controller-pprof.zip

Steps to Reproduce

apisix-ingress-controller version : 1.6.1

  1. restart apisix multiple times,
  2. and then observe the memory of apisix-ingress-controller

Environment

/ingress-apisix # ./apisix-ingress-controller version --long

Version: 1.6.1
Git SHA: no-git-module
Go Version: go1.19.8
Building OS/Arch: linux/amd64
Running OS/Arch: linux/amd64

kubectl version

WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short.  Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"clean", BuildDate:"2022-08-23T17:44:59Z", GoVersion:"go1.19", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v4.5.7
Server Version: version.Info{Major:"1", Minor:"25", GitVersion:"v1.25.0", GitCommit:"a866cbe2e5bbaa01cfd5e969aa3e033f3282a8a2", GitTreeState:"archive", BuildDate:"2024-03-19T23:04:37Z", GoVersion:"go1.19.3", Compiler:"gc", Platform:"linux/amd64"}

uname -a

Linux master-01 4.18.0-372.19.1.495.po1.x86_64 #1 SMP Fri Mar 1 03:15:24 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
twotwo7 commented 5 months ago

informers start but not call shutdown? func (c *ListerInformer) StartAndWaitForCacheSync(ctx context.Context) bool {

call c.KubeFactory.Shutdown() directly?

twotwo7 commented 5 months ago

Please help me! Thank you~! @tao12345666333 @AlinsRan @Revolyssup

Revolyssup commented 4 months ago

@twotwo7 Thanks for reporting. I will take a look at it today.

twotwo7 commented 4 months ago

@twotwo7 Thanks for reporting. I will take a look at it today.

Thank you very much for your response and assistance. May I inquire about the current progress? It appears to be a stable and reproducible memory leak issue. The leak seems to occur each time APISIX is restarted, possibly due to the reinitialization of the informer within the APISIX-ingress-controller, and the subsequent list & watch operations.

@Revolyssup

github-actions[bot] commented 1 month ago

This issue has been marked as stale due to 90 days of inactivity. It will be closed in 30 days if no further activity occurs. If this issue is still relevant, please simply write any comment. Even if closed, you can still revive the issue at any time or discuss it on the dev@apisix.apache.org list. Thank you for your contributions.

github-actions[bot] commented 3 weeks ago

This issue has been closed due to lack of activity. If you think that is incorrect, or the issue requires additional review, you can revive the issue at any time.