Kong / kubernetes-ingress-controller

:gorilla: Kong for Kubernetes: The official Ingress Controller for Kubernetes.
https://docs.konghq.com/kubernetes-ingress-controller/
Apache License 2.0
2.21k stars 590 forks source link

panic: runtime error: slice bounds out of range #2621

Open scirner22 opened 2 years ago

scirner22 commented 2 years ago

Is there an existing issue for this?

Current Behavior

Seemingly out of nowhere we had a kong deployment running for ~6 days and for the last hour it's been in a crashloop with the following logs.

{"NetV1Ingress":"{\"Namespace\":\"ns1\",\"Name\":\"service-1\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
{"NetV1Ingress":"{\"Namespace\":\"ns2\",\"Name\":\"processor-1\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
{"NetV1Ingress":"{\"Namespace\":\"ns3\",\"Name\":\"service-2\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
{"NetV1Ingress":"{\"Namespace\":\"ns4\",\"Name\":\"service-3\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
{"NetV1Ingress":"{\"Namespace\":\"n1\",\"Name\":\"service-4\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
{"NetV1Ingress":"{\"Namespace\":\"n1\",\"Name\":\"service-5\"}","error":"resource not yet configured in the data-plane","level":"error","logger":"controllers.Ingress.netv1","msg":"namespace","time":"2022-06-29T05:21:35Z"}
panic: runtime error: slice bounds out of range [2612:2588]

goroutine 1006 [running]:
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).patchMake2(0xc006df7548, {0xc0042b8a00?, 0x0?}, {0xc001d58000?, 0x20, 0x1?})
        /go/pkg/mod/github.com/sergi/go-diff@v1.2.0/diffmatchpatch/patch.go:171 +0xa69
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).PatchMake(0xc006df7548?, {0xc006df7490?, 0x1c7?, 0x200?})
        /go/pkg/mod/github.com/sergi/go-diff@v1.2.0/diffmatchpatch/patch.go:131 +0x1bc
github.com/sergi/go-diff/diffmatchpatch.(*DiffMatchPatch).PatchMake(0x15c02c0?, {0xc006df7648?, 0x15c02c0?, 0xc004b4b970?})
        /go/pkg/mod/github.com/sergi/go-diff@v1.2.0/diffmatchpatch/patch.go:129 +0x33a
github.com/yudai/gojsondiff.(*Differ).compareValues(0xc00041e140, {0x1a862e0, 0xc004b565e0}, {0x15c02c0?, 0xc004b4b810?}, {0x15c02c0?, 0xc004b4b970?})
        /go/pkg/mod/github.com/yudai/gojsondiff@v1.0.0/gojsondiff.go:269 +0x74c
github.com/yudai/gojsondiff.(*Differ).compareMaps(0xc00325d900?, 0x23c6?, 0x2500?)
        /go/pkg/mod/github.com/yudai/gojsondiff@v1.0.0/gojsondiff.go:95 +0x437
github.com/yudai/gojsondiff.(*Differ).CompareObjects(...)
        /go/pkg/mod/github.com/yudai/gojsondiff@v1.0.0/gojsondiff.go:72
github.com/yudai/gojsondiff.(*Differ).Compare(0x173c740?, {0xc00325b400, 0x233a, 0x2500}, {0xc00325d900, 0x23c6, 0x2500})
        /go/pkg/mod/github.com/yudai/gojsondiff@v1.0.0/gojsondiff.go:63 +0xdf
github.com/kong/deck/diff.getDiff({0x173c740, 0xc003338620}, {0x173c740, 0xc0033385b0})
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff_helpers.go:71 +0xdd
github.com/kong/deck/diff.(*Syncer).Solve.func2({{{0x182a014, 0x6}}, {0x182e9c7, 0xb}, {0x173c740, 0xc0033385b0}, {0x173c740, 0xc003338620}})
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:354 +0x1b5
github.com/kong/deck/diff.(*Syncer).handleEvent.func1()
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:287 +0x9b
github.com/cenkalti/backoff/v4.RetryNotifyWithTimer(0xc006df7e18, {0x1a80d90, 0xc0065f6240}, 0x0, {0x0?, 0x0?})
        /go/pkg/mod/github.com/cenkalti/backoff/v4@v4.1.2/retry.go:55 +0x12a
github.com/cenkalti/backoff/v4.RetryNotify(...)
        /go/pkg/mod/github.com/cenkalti/backoff/v4@v4.1.2/retry.go:34
github.com/cenkalti/backoff/v4.Retry(...)
        /go/pkg/mod/github.com/cenkalti/backoff/v4@v4.1.2/retry.go:28
github.com/kong/deck/diff.(*Syncer).handleEvent(0xc0044dc000?, {0x1a93a98?, 0xc0040f6c00?}, 0xc006cee630?, {{{0x182a014, 0x6}}, {0x182e9c7, 0xb}, {0x173c740, 0xc0033385b0}, ...})
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:286 +0x108
github.com/kong/deck/diff.(*Syncer).eventLoop(0xc0044dc000, {0x1a93a98, 0xc0040f6c00}, 0xc0042e8480?)
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:276 +0x15b
github.com/kong/deck/diff.(*Syncer).Run.func1()
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:214 +0x37
created by github.com/kong/deck/diff.(*Syncer).Run
        /go/pkg/mod/github.com/kong/deck@v1.11.0/diff/diff.go:213 +0x1cd

Above is shortened, but there's actually hundreds of lines in the form of resource not yet configured in the data-plane that are logged before the crash.

I searched the code base and it didn't return anything for 2588. I thought that might be an array allocation somewhere.

On 2.4.1 those log lines don't appear, but the end result is the same crash.

Expected Behavior

KIC completes sync loop with no errors and kong remains operational.

Steps To Reproduce

No response

Kong Ingress Controller version

Problem exists on at least 2.3.0, 2.4.0, and 2.4.1

Kubernetes version

Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.9-gke.1300", GitCommit:"4b8c7c146733b9eca0f0813a2d9b5ff557e9506b", GitTreeState:"clean", BuildDate:"2022-05-11T09:26:54Z", GoVersion:"go1.16.15b7", Compiler:"gc", Platform:"linux/amd64"}


### Anything else?

_No response_
scirner22 commented 2 years ago

I was able to resolve this by removing K8s Secrets that were updated. I need to investigate further whether they were valid kubernetes.io/tls secrets, but based on the resolution I feel like they were not and that was causing kong-ingress-controller to panic.

I guess optimal behavior would have been for kong-ingress-controller to report the error and fail the sync iteration so that the kong-proxy pod can remain online and able to serve traffic.

rainest commented 2 years ago

The resource not yet configured in the data-plane logs indicate that there's some resource that the controller has seen and included in Kong configuration, but has failed to successfully send to Kong. You'll see a bunch of those in general if your recent config syncs have failed. Is there some other error above that indicates why the last configuration sync failed?

For the panic, we'd probably need some set of Kubernetes resources and changes to them that duplicate the issue. The traceback indicates that there's some resource that triggers a bug way down in the bowels of a third-party library we use to generate pretty-print JSON diffs, and it's probably quite specific to the particulars of the problem JSON objects.

That said it's quite possible this is the same as https://github.com/sergi/go-diff/issues/127

rainest commented 2 years ago

It may not be necessary given that the known upstream bug is probably the culprit, but providing config dumps from https://docs.konghq.com/kubernetes-ingress-controller/2.4.x/troubleshooting/#dumping-generated-kong-configuration and a deck dump from the Kong admin API should let us reproduce this and possibly suggest a config-level mitigation.

Based on conversation elsewhere this doesn't happen consistently with the same set of Kubernetes resources, which makes sense as the panic occurs in the diff logic and should depend on the current state of configuration in Kong as well as the desired state.

mflendrich commented 2 years ago

blocked on https://github.com/Kong/deck/issues/722

scirner22 commented 2 years ago

The cause ended up being an existing K8s Secret that was leveraged in Kong as a tls cert was updated with an incorrect crt value.

diegoot-dev commented 1 year ago

any news? I have de same problem

pmalek commented 1 year ago

Hi @diegoot-dev 👋

Are you able to provide us your reproduction steps? Are they similar to @scirner22's?