aws / aws-network-policy-agent

Apache License 2.0
42 stars 23 forks source link

segfault when enableNetworkPolicy = false and NETWORK_POLICY_ENFORCING_MODE = strict #286

Open simonlewandowski opened 1 week ago

simonlewandowski commented 1 week ago

What happened: We are prepping a migration from calico to vpc-cni for network policy enforcement. We have just encountered a problem where
we set NETWORK_POLICY_ENFORCING_MODE = strict but we are not enabling netpols management yet via vpc-cni policy agent with enableNetworkPolicy = false. This is because we still run calico for netpols in this particular cluster.

Attach logs segfault from aws-eks-nodeagent

`{"level":"info","ts":"2024-07-03T09:59:36.795Z","caller":"metrics/metrics.go:23","msg":"Serving metrics on ","port":61680} panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x30 pc=0x55cd067057e2]

goroutine 146 [running]: github.com/aws/aws-network-policy-agent/pkg/rpc.(server).EnforceNpToPod(0xc0005e9cc0, {0xc0007b2730?, 0x55cd04fa96c6?}, 0xc0007b2730) /workspace/pkg/rpc/rpc_handler.go:51 +0x182 github.com/aws/amazon-vpc-cni-k8s/rpc._NPBackend_EnforceNpToPod_Handler({0x55cd074a2680?, 0xc0005e9cc0}, {0x55cd078b2998, 0xc0007d4e40}, 0xc0007a2f80, 0x0) /go/pkg/mod/github.com/aws/amazon-vpc-cni-k8s@v1.18.1/rpc/rpc.pb.go:957 +0x169 google.golang.org/grpc.(Server).processUnaryRPC(0xc0001c5000, {0x55cd078b2998, 0xc0007d4db0}, {0x55cd078b9fe0, 0xc000876180}, 0xc0007acea0, 0xc000580b10, 0x55cd08ab0ab0, 0x0) /go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1369 +0xe23 google.golang.org/grpc.(Server).handleStream(0xc0001c5000, {0x55cd078b9fe0, 0xc000876180}, 0xc0007acea0) /go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1780 +0x1016 google.golang.org/grpc.(Server).serveStreams.func2.1() /go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1019 +0x8b created by google.golang.org/grpc.(*Server).serveStreams.func2 in goroutine 123 /go/pkg/mod/google.golang.org/grpc@v1.63.2/server.go:1030 +0x135`

What you expected to happen: aws-eks-nodeagent should start and with given setting it should not enable network policies nor enforce strict mode or segfault.

How to reproduce it (as minimally and precisely as possible): run vpc-cni with enableNetworkPolicy = false and NETWORK_POLICY_ENFORCING_MODE = strict

Anything else we need to know?: Another cluster which runs vpc-cni with enableNetworkPolicy = false and NETWORK_POLICY_ENFORCING_MODE = standard did not encounter this crash

Environment: