devtron-labs / devtron

Tool integration platform for Kubernetes
https://devtron.ai
Apache License 2.0
4.3k stars 474 forks source link

Bug: orchestrator restarts with error: concurrent map read and map write #5425

Closed jatin-jangir-0220 closed 3 days ago

jatin-jangir-0220 commented 2 months ago

📜 Description

logs


08_58_16.log-9359-{"date":"2024-06-27T09:00:05.082350Z","time":"2024-06-27T09:00:05.082350428Z","stream":"stderr","_p":"F","log":""}
08_58_16.log-9360-{"date":"2024-06-27T09:00:05.082353Z","time":"2024-06-27T09:00:05.082353648Z","stream":"stderr","_p":"F","log":"goroutine 825521 [chan receive, 15 minutes]:"}
08_58_16.log-9361-{"date":"2024-06-27T09:00:05.082356Z","time":"2024-06-27T09:00:05.082356968Z","stream":"stderr","_p":"F","log":"github.com/moby/spdystream.(*Connection).Serve.func1(0xc004ef001c?)"}
08_58_16.log-9362-{"date":"2024-06-27T09:00:05.082360Z","time":"2024-06-27T09:00:05.082360308Z","stream":"stderr","_p":"F","log":"\t/go/src/github.com/devtron-labs/devtron/vendor/github.com/moby/spdystream/connection.go:322 +0x25"}
08_58_16.log-9363-{"date":"2024-06-27T09:00:05.082363Z","time":"2024-06-27T09:00:05.082363518Z","stream":"stderr","_p":"F","log":"created by github.com/moby/spdystream.(*Connection).Serve in goroutine 825622"}
08_58_16.log-9364-{"date":"2024-06-27T09:00:05.082366Z","time":"2024-06-27T09:00:05.082366748Z","stream":"stderr","_p":"F","log":"\t/go/src/github.com/devtron-labs/devtron/vendor/github.com/moby/spdystream/connection.go:321 +0x151"}
08_58_16.log-9365-{"date":"2024-06-27T09:00:05.082369Z","time":"2024-06-27T09:00:05.082369849Z","stream":"stderr","_p":"F","log":""}
08_58_16.log-9366-{"date":"2024-06-27T09:00:05.082373Z","time":"2024-06-27T09:00:05.082373009Z","stream":"stderr","_p":"F","log":"goroutine 845023 [IO wait, 8 minutes]:"}
08_58_16.log-9367-{"date":"2024-06-27T09:00:05.082376Z","time":"2024-06-27T09:00:05.082376179Z","stream":"stderr","_p":"F","log":"internal/poll.runtime_pollWait(0x7fd05829b7e8, 0x72)"}
08_58_16.log-9368-{"date":"2024-06-27T09:00:05.082379Z","time":"2024-06-27T09:00:05.082379329Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/runtime/netpoll.go:343 +0x85"}
08_58_16.log-9369-{"date":"2024-06-27T09:00:05.082382Z","time":"2024-06-27T09:00:05.082382689Z","stream":"stderr","_p":"F","log":"internal/poll.(*pollDesc).wait(0xc004b52f00?, 0xc0053317b1?, 0x0)"}
08_58_16.log-9370-{"date":"2024-06-27T09:00:05.082385Z","time":"2024-06-27T09:00:05.082385869Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:84 +0x27"}
08_58_16.log-9371-{"date":"2024-06-27T09:00:05.082388Z","time":"2024-06-27T09:00:05.082388959Z","stream":"stderr","_p":"F","log":"internal/poll.(*pollDesc).waitRead(...)"}
08_58_16.log-9372-{"date":"2024-06-27T09:00:05.082392Z","time":"2024-06-27T09:00:05.082392139Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/internal/poll/fd_poll_runtime.go:89"}
08_58_16.log-9373-{"date":"2024-06-27T09:00:05.082395Z","time":"2024-06-27T09:00:05.082395289Z","stream":"stderr","_p":"F","log":"internal/poll.(*FD).Read(0xc004b52f00, {0xc0053317b1, 0x1, 0x1})"}
08_58_16.log-9374-{"date":"2024-06-27T09:00:05.082398Z","time":"2024-06-27T09:00:05.082398439Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/internal/poll/fd_unix.go:164 +0x27a"}
08_58_16.log-9375-{"date":"2024-06-27T09:00:05.082401Z","time":"2024-06-27T09:00:05.082401559Z","stream":"stderr","_p":"F","log":"net.(*netFD).Read(0xc004b52f00, {0xc0053317b1?, 0xc00a125f40?, 0x1eac3b0?})"}
08_58_16.log-9376-{"date":"2024-06-27T09:00:05.082404Z","time":"2024-06-27T09:00:05.082404779Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/net/fd_posix.go:55 +0x25"}
08_58_16.log-9377-{"date":"2024-06-27T09:00:05.082408Z","time":"2024-06-27T09:00:05.082408049Z","stream":"stderr","_p":"F","log":"net.(*conn).Read(0xc000d0a060, {0xc0053317b1?, 0x1?, 0xc003e09a90?})"}
08_58_16.log-9378-{"date":"2024-06-27T09:00:05.082411Z","time":"2024-06-27T09:00:05.082411299Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/net/net.go:179 +0x45"}
08_58_16.log-9379-{"date":"2024-06-27T09:00:05.082414Z","time":"2024-06-27T09:00:05.082414449Z","stream":"stderr","_p":"F","log":"net/http.(*connReader).backgroundRead(0xc0053317a0)"}
08_58_16.log-9380-{"date":"2024-06-27T09:00:05.082417Z","time":"2024-06-27T09:00:05.082417609Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/net/http/server.go:683 +0x37"}
08_58_16.log-9381-{"date":"2024-06-27T09:00:05.082420Z","time":"2024-06-27T09:00:05.082420839Z","stream":"stderr","_p":"F","log":"created by net/http.(*connReader).startBackgroundRead in goroutine 844342"}
08_58_16.log-9382-{"date":"2024-06-27T09:00:05.082424Z","time":"2024-06-27T09:00:05.082424Z","stream":"stderr","_p":"F","log":"\t/usr/local/go/src/net/http/server.go:679 +0xba"}
08_58_16.log-9383-{"date":"2024-06-27T09:00:05.082426Z","time":"2024-06-27T09:00:05.08242699Z","stream":"stderr","_p":"F","log":""}
08_58_16.log-9384-{"date":"2024-06-27T09:00:05.082430Z","time":"2024-06-27T09:00:05.08243023Z","stream":"stderr","_p":"F","log":"goroutine 855258 [select, 4 minutes]:"}
08_58_16.log-9385-{"date":"2024-06-27T09:00:05.082435Z","time":"2024-06-27T09:00:05.08243562Z","stream":"stderr","_p":"F","log":"github.com/devtron-labs/devtron/api/k8s/application.(*K8sApplicationRestHandlerImpl).GetPodLogs.func1(0x225aee5?, 0xc002afc630?)"}
08_58_16.log-9386-{"date":"2024-06-27T09:00:05.082439Z","time":"2024-06-27T09:00:05.08243904Z","stream":"stderr","_p":"F","log":"\t/go/src/github.com/devtron-labs/devtron/api/k8s/application/k8sApplicationRestHandler.go:689 +0x4b"}
08_58_16.log-9387-{"date":"2024-06-27T09:00:05.082442Z","time":"2024-06-27T09:00:05.08244223Z","stream":"stderr","_p":"F","log":"created by github.com/devtron-labs/devtron/api/k8s/application.(*K8sApplicationRestHandlerImpl).GetPodLogs in goroutine 852604"}```

### Affected areas

None

### Additional affected areas

None

### Prod/Non-prod environments?

None

### Is User unblocked?

None

### How was the user un-blocked?

None

### Impact on Enterprise

1 one-mg

### 👟 Steps to replicate the Issue

none

### 👍 Expected behavior

orchestrator should not restart

### 👎 Actual Behavior

restarts in orchestrator

### ☸ Kubernetes version

any

### Cloud provider

any

### 🌍 Browser

Chrome

### ✅ Proposed Solution

_No response_

### 👀 Have you spent some time to check if this issue has been raised before?

- [X] I checked and didn't find any similar issue

### 🏢 Have you read the Code of Conduct?

- [X] I have read the [Code of Conduct](https://github.com/devtron-labs/devtron/blob/main/CODE_OF_CONDUCT.md)

AB#10113
vikramdevtron commented 1 week ago

Investigations and Findings:- we use moby/spdystream library which internally call multiple go-routines where a concurrency case occured due to which golang panics in case of concurrent map read and write. Due to this we wont be able to recover the panic as recover only works when it recovered in that goroutine itself.