Closed mrwillis closed 3 years ago
Hello, @mrwillis. Thank you for reporting this issue. I investigated this problem in the code of latest develop
branch (https://github.com/0xPolygon/polygon-sdk/commit/2f50a194f21d972b542ccd1e1571ccba40c5c810) briefly and found a memory leak issue around gRPC.
This issue prevented GC from clearing buffer in gRPC and buffer was accumulated. (In my check, the heap size was increasing about 50 MB every hour) I think this is the main cause of memory leak currently. I've opened new PR (https://github.com/0xPolygon/polygon-sdk/pull/100) to fix it, so the issue will be fixed shortly.
I saw continuous increase of heap in IBFT package. But I'm currently not sure it will cause memory leak. I will test for longer term to check long-term impact of my code change and other memory leak issue.
Thank you again for your time to have raised the issue.
Hi, I've opened another PR. I've merged this change and run IBFT nodes in AWS EC2 instance (t2.small). As of now, all 4 nodes have been running without any crashes over 3 days. I'm collecting profiles of the nodes and summarized it.
First chart is the result when I run nodes with the change of PR #100, second one is the result when I run nodes with the change of PR #100 and #105. The x-axis shows time and y-axis shows heap size of each function in kB. Please note the ranges of y-axis are same but the ranges of x-axis are different because of different measurement period. Each line shows the change of heap size in each function.(Please check legends) I picked some the functions which allocate biggest amount of heap.
The result shows PR #100 is not enough to resolve memory leak and there is no memory leak after I merged both #100 and #105. I'm worried about that leveldb uses much heap and increases it from the beginning. But the change is moving periodically. (As of now, I can't say it's memory leak)
Excellent job @Kourin1996 for solving the issue in multiple places and building a handy tool along the way: https://github.com/0xPolygon/go-profile-chart-generator/
I am closing the issue.
Memory leak
Description
It seems like there is a memory leak somewhere in the code. Happens after a while of producing blocks leaving us to believe it's something related to block production. CPU usage remains constant.
Your environment
6906ed9
develop
Steps to reproduce
polygon-sdk server run
with IBFT consensus of 4 nodes running for about 12 hours.polygon-sdk server run
..Expected behaviour
No memory leak
Actual behaviour
Memory leak causing an OOM eerror causing a crash
Logs
Last 100:
Proposed solution
If you have an idea of how to fix this issue, please write it down here, so we can begin discussing it