envoyproxy / gateway

Manages Envoy Proxy as a Standalone or Kubernetes-based Application Gateway
https://gateway.envoyproxy.io
Apache License 2.0
1.64k stars 354 forks source link

BackendTrafficPolicy openapi schemas do not validate correctly #4746

Open nicks opened 5 days ago

nicks commented 5 days ago

Description: I'm trying to use kubeconform to validate BackendTrafficPolicy

I get this error:

stdin - BackendTrafficPolicy my-policy is invalid: problem validating schema. Check JSON formatting: jsonschema: '/spec/healthCheck/active/interval' does not validate with https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/gateway.envoyproxy.io/backendtrafficpolicy_v1alpha1.json#/properties/spec/properties/healthCheck/properties/active/properties/interval/format: '3s' is not valid 'duration'
stdin - BackendTrafficPolicy my-policy is invalid: problem validating schema. Check JSON formatting: jsonschema: '/spec/healthCheck/active/timeout' does not validate with https://raw.githubusercontent.com/datreeio/CRDs-catalog/main/gateway.envoyproxy.io/backendtrafficpolicy_v1alpha1.json#/properties/spec/properties/healthCheck/properties/active/properties/timeout/format: '20ms' is not valid 'duration'

Additional info: I think the openapi schemas that we're generating are wrong. Particularly this line:

https://github.com/envoyproxy/gateway/blob/6c6633c2a8bfc772415ea3bf68e14ef20064cb22/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_backendtrafficpolicies.yaml#L314

They say that timeout and interval are openapi durations. But they're not, they're a Kubernetes-specific duration format.

arkodg commented 5 days ago

thanks for raising this, do we need to edit these to string instead ? The CEL validation on the metav1.Duration should still kick in and allow acceptable values https://github.com/envoyproxy/gateway/blob/6c6633c2a8bfc772415ea3bf68e14ef20064cb22/api/v1alpha1/healthcheck_types.go#L81

nicks commented 5 days ago

ya, i poked around at how Cluster API is doing it, and i think they just leave the kubebuilder validation off entirely? https://github.com/kubernetes-sigs/cluster-api/blob/781d1e4c39286d3d1e93c71557b5d02c2e78961b/api/v1beta1/machinehealthcheck_types.go#L101

and it will do the right thing with the existing metav1.Duration type?

arkodg commented 5 days ago

ok I see an issue

we ended up using metav1.Duration in this API but we really should have used https://github.com/kubernetes-sigs/gateway-api/blob/962b35ea3c71ae604e0db696780c76a86b58ee17/apis/v1/shared_types.go#L765 which is used everywhere else in the project https://github.com/envoyproxy/gateway/blob/6c6633c2a8bfc772415ea3bf68e14ef20064cb22/api/v1alpha1/keepalive_types.go#L26

this has a CEL validation e.g. https://github.com/envoyproxy/gateway/blob/6c6633c2a8bfc772415ea3bf68e14ef20064cb22/charts/gateway-helm/crds/generated/gateway.envoyproxy.io_backendtrafficpolicies.yaml#L1305 , and doesnt require a kubebuilder validation

this is a breaking change, because we are tightening the validation, not loosening it

https://github.com/kubernetes/apimachinery/blob/96b97de8d6ba49bc192968551f2120ef3881f42d/pkg/apis/meta/v1/duration.go#L27

cc @envoyproxy/gateway-maintainers

arkodg commented 5 days ago

thanks for helping us find this @nicks

we'll also need to add an entry in https://github.com/envoyproxy/gateway/blob/6c6633c2a8bfc772415ea3bf68e14ef20064cb22/tools/linter/golangci-lint/.golangci.yml#L34 to make sure this doesnt happen again