aws / aws-network-policy-agent

Apache License 2.0
43 stars 27 forks source link

UTC Logger.check error: failed to get caller #103

Closed tl-alex-nicot closed 3 months ago

tl-alex-nicot commented 11 months ago

looks like with v1.15.1 of the aws vpc cni the output for the aws-eks-nodeagent just says

k logs aws-node-jpg2r -c aws-eks-nodeagent
{"level":"info","ts":"2023-10-19T09:55:41.395Z","caller":"runtime/asm_amd64.s:1598","msg":"version","GitVersion":"","GitCommit":"","BuildDate":""}
2023-10-19 09:55:41.481551956 +0000 UTC Logger.check error: failed to get caller
2023-10-19 09:55:41.481656868 +0000 UTC Logger.check error: failed to get caller
jayanthvn commented 10 months ago

It shouldn't cause any functionality impact but we will clean it up, the error is from the Uber/zap logger.

Jufik commented 7 months ago

Hey!

Using VPC CNI v1.16.0 I'm still facing this Issue. This makes enable-policy-event-logs pretty useless and the logging feature pretty useless. Cluster uses EKS (1.27) with managed node groups (1.27.9-20240117): any chance to get a pointer on how to fix that and get hands on policy event logs ?

jayanthvn commented 7 months ago

@Jufik - Are you checking pod logs for policy logs when you enable - enable-policy-event-logs ? The decision/access logs will be redirected to /var/log/aws-routed-eni/network-policy-agent.log when the knob is enabled. The same log file will also contain node agent logs..

snarfmonkey commented 6 months ago

Am i doing something wrong, because I am using 1.16.3 and still see Logger.check error: failed to get caller logged constantly when policy event logs are enabled. I'd rather not log redundant, unimportant lines on every single pod on each node. Is is possible to change the log level so I don't have to log these to disk at all?

atilsensalduz commented 6 months ago

The problem persists, and I'm encountering it with aws-network-policy-agent:v1.0.8-eksbuild.1 and eks 1.27. Can we consider reopening the issue? @jayanthvn @jdn5126

Additionally, when I enable both the enable-cloudwatch-logs and enable-policy-event-logs parameters, the pods get stuck in a crashloopbackoff state with exit code 1, and no logs are generated.

giedriuskilcauskas commented 5 months ago

I can confirm that this issue still persists with EKS 1.28 / aws-network-policy-agent:v1.1.0-eksbuild.1 Can this issue be /reopen 'ed?

parvez99 commented 5 months ago

I'm seeing a similar issue with EKS 1.29 and CNI : v1.16.2-eksbuild.1. Looks the issue needs to be reopened unless we missing something ?

{"level":"info","ts":"2024-03-26T18:30:07.320Z","caller":"runtime/asm_amd64.s:1650","msg":"version","GitVersion":"","GitCommit":"","BuildDate":""} 2024-03-26 18:30:07.336116596 +0000 UTC Logger.check error: failed to get caller

weiwuprojects commented 5 months ago

I was getting this error with EKS 1.29 and VPC CNI v1.17.1, but it went away after commenting serviceAccountArn property in my CDK resource


    new eks.CfnAddon(this, "VpcCniAddon", {
      clusterName: cluster.clusterName,
      addonName: "vpc-cni",
      addonVersion: "v1.17.1-eksbuild.1",
      resolveConflicts: "PRESERVE",
      // serviceAccountRoleArn: cluster.role.roleArn,
      configurationValues: JSON.stringify({
        env: {
          ENABLE_PREFIX_DELEGATION: "true",
          WARM_PREFIX_TARGET: "1",
        },
      }),
    });
giedriuskilcauskas commented 5 months ago

I was getting this error with EKS 1.29 and VPC CNI v1.17.1, but it went away after commenting serviceAccountArn property in my CDK resource

    new eks.CfnAddon(this, "VpcCniAddon", {
      clusterName: cluster.clusterName,
      addonName: "vpc-cni",
      addonVersion: "v1.17.1-eksbuild.1",
      resolveConflicts: "PRESERVE",
      // serviceAccountRoleArn: cluster.role.roleArn,
      configurationValues: JSON.stringify({
        env: {
          ENABLE_PREFIX_DELEGATION: "true",
          WARM_PREFIX_TARGET: "1",
        },
      }),
    });

That potentially indicate that role missing some permissions which are present on nodegroup's role

jcmcken commented 5 months ago

Using VPC CNI addon 1.17.1, based on observing log data, this error seems to occur only when a policy verdict is being made. So the more verdicts you have, the more spammy this log is.

If it helps anything, our settings are:

{
  "enableNetworkPolicy": "true",
  "env": {
    "AWS_VPC_ENI_MTU": "1480",
    "AWS_VPC_K8S_CNI_LOG_FILE": "stdout",
    "AWS_VPC_K8S_PLUGIN_LOG_FILE": "stderr",
    "AWS_VPC_K8S_CNI_CUSTOM_NETWORK_CFG": "true"
  },
  "nodeAgent": {
    "enablePolicyEventLogs": "true",
    "enableCloudWatchLogs": "true",
  }
}

We also have some IRSA policy attached to deliver the cloudwatch logs (which are working, by the way -- we see policy verdicts in CW)

jayanthvn commented 3 months ago

Fix is released with network policy agent v1.1.2 for the original issue of logs having failed to get caller... - https://github.com/aws/amazon-vpc-cni-k8s/releases/tag/v1.18.2. Please test and let us know if there are any issues.