Open dtelaroli opened 11 months ago
I think this is possibly fixed in 1.6, could you try 1.6.4?
Hi @zachaller I have another issue that is a blocker to me upgrade the argo-rollouts. https://github.com/argoproj/argo-rollouts/issues/3223
Anyway, the PR https://github.com/argoproj/argo-rollouts/pull/2887 fixes the problem 1, it doesn't solve the problem 2.
@dtelaroli did you have some findings for the memory footprint issue? we observed some potential memory leak issue in our env too. (usually the memory usage is 200Mi, but after 15 days, it becomes 600Mi, althrough we only have about 5 rollouts in our cluster.
@andyliuliming i've discovered that the issue happens when you have a big manifest synced by the application.
There is a limit of size and when the size is over the limit the argo-rollouts dispatch error each sync cicle and this generates the memory leak. Request entity too large: limit is 3145728
Fixing the big manifest, the issue disappear.
Another issue that I had is because the rollouts adds a empty step during the setHeaderRoute: - {}
This brakes the rollouts also, generating memory leak.
Checklist:
Describe the bug
Problem 1: The argo-rollouts is adding duplicated header route, flooding the virtual service with a content that is bigger than the etcd supports.
Problem 2: After the problem 1, the argo-rollouts pod is leaking memory consuming all the node memory, it restarts and starts again the cycle. This problem happens if happens any problem which generates a big manifest. I saw same behavior using the analysis-run for 24h of metrics collection.
To Reproduce
I don't know how to reproduce the Problem 1. It's possible to reproduce the Problem 2 creating a virtual service with this route duplicated.
It's needed more than 6k lines to error happen. After that, execute a change in the rollout to starts a new rollout version.
Expected behavior
Screenshots
Version
v1.5.0
Logs
Message from the maintainers:
Impacted by this bug? Give it a 👍. We prioritize the issues with the most 👍.