Closed madhavpersistent closed 1 month ago
@madhavpersistent Can you elaborate on container fails with the following error: “check error: failed to get caller”.
? What is failing? Is CNI pod not moving to Running
state? Above log message shouldn't contribute to any functionality issue and more of a false flag that we need to address..
We get these error logs as well.
@achevuru You are correct that these log messages do not contribute to any functionality issues. However, it is an issue for us when we ship container logs to our observability platform. These error logs are printed so frequently that it causes observability costs to explode.
We opted to keep this option turned off (not ideal) because of that.
@xamroc You should ideally see these logs just once during bootup. If you're seeing them frequently, please check if aws-eks-nodeagent
container is constantly restarting for some reason in your cluster
@achevuru It isn't. The aws-node
pod is running without restarts. Both containers inside it are running fine as well.
They just constantly log Logger.check error: failed to get caller
.
I've already worked with AWS Support on this and they can provide those details. They've captured the logs from our nodes as well if that helps.
We are also experiencing the same issue, steps below. Do we have a fix on the way or a workaround available?
Upgrade EKS cluster from version 1.27 to 1.28. Upgrade VPC CNI plugin from v1.15.0 to v1.18.0. Observe the initiation of containers. Expected Behavior: The init containers should start without any errors post-upgrade.
Actual Behavior: One of the init containers fails to start, logging the following error: “check error: failed to get caller.”
This issue is fixed with this PR - https://github.com/aws/aws-network-policy-agent/pull/254. We are working on the release and should have the released image by this week
@connorharkness95 - Are you seeing the error with init container and not aws-eks-nodeagent?
@jayanthvn is the fix released?
Yes the fix is released with latest network policy agent - 1.1.2 - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2
Issue Description: After upgrading the VPC CNI plugin from v1.15.0 to v1.18.0 in our EKS cluster (upgraded from version 1.26 to 1.27), we are encountering an issue with one of the init containers. The container fails with the following error: “check error: failed to get caller”. This issue persists despite claims that it was addressed in a recent GitHub pull request.
Previous Interaction: This issue is similar to one we experienced previously during an update from CNI version v1.15.0 to v1.16.0 while upgrading the EKS Cluster from version 1.25 to 1.27. The problem was supposedly resolved according to AWS support referencing GitHub pull request #168.
Current Problem Despite the resolution mentioned in the GitHub pull request, the error is reoccurring in the latest upgrade scenario.
Steps to Reproduce
Expected Behavior: The init containers should start without any errors post upgrade.
Actual Behavior: One of the init containers fails to start, logging the following error: “check error: failed to get caller.”
Additional Information • EKS Cluster Version: 1.28 • VPC CNI Plugin Version: v1.18.0 • Error Logs: "{"level":"info","ts":"2024-04-0914:08:52.66Z","caller":"metrics/metrics.go:23","msg":"Serving metrics on ","port":6160} 2024-04-09 14:08:52.6406 +0000 UTC Logger.check error: failed to get caller"
Questions