Open scirner22 opened 2 years ago
I was able to resolve this by removing K8s Secrets that were updated. I need to investigate further whether they were valid kubernetes.io/tls
secrets, but based on the resolution I feel like they were not and that was causing kong-ingress-controller to panic.
I guess optimal behavior would have been for kong-ingress-controller to report the error and fail the sync iteration so that the kong-proxy pod can remain online and able to serve traffic.
The resource not yet configured in the data-plane
logs indicate that there's some resource that the controller has seen and included in Kong configuration, but has failed to successfully send to Kong. You'll see a bunch of those in general if your recent config syncs have failed. Is there some other error above that indicates why the last configuration sync failed?
For the panic, we'd probably need some set of Kubernetes resources and changes to them that duplicate the issue. The traceback indicates that there's some resource that triggers a bug way down in the bowels of a third-party library we use to generate pretty-print JSON diffs, and it's probably quite specific to the particulars of the problem JSON objects.
That said it's quite possible this is the same as https://github.com/sergi/go-diff/issues/127
It may not be necessary given that the known upstream bug is probably the culprit, but providing config dumps from https://docs.konghq.com/kubernetes-ingress-controller/2.4.x/troubleshooting/#dumping-generated-kong-configuration and a deck dump from the Kong admin API should let us reproduce this and possibly suggest a config-level mitigation.
Based on conversation elsewhere this doesn't happen consistently with the same set of Kubernetes resources, which makes sense as the panic occurs in the diff logic and should depend on the current state of configuration in Kong as well as the desired state.
blocked on https://github.com/Kong/deck/issues/722
The cause ended up being an existing K8s Secret that was leveraged in Kong as a tls cert was updated with an incorrect crt
value.
any news? I have de same problem
Hi @diegoot-dev 👋
Are you able to provide us your reproduction steps? Are they similar to @scirner22's?
Is there an existing issue for this?
Current Behavior
Seemingly out of nowhere we had a kong deployment running for ~6 days and for the last hour it's been in a crashloop with the following logs.
Above is shortened, but there's actually hundreds of lines in the form of
resource not yet configured in the data-plane
that are logged before the crash.I searched the code base and it didn't return anything for
2588
. I thought that might be an array allocation somewhere.On 2.4.1 those log lines don't appear, but the end result is the same crash.
Expected Behavior
KIC completes sync loop with no errors and kong remains operational.
Steps To Reproduce
No response
Kong Ingress Controller version
Kubernetes version