TheThingsNetwork / lorawan-stack

The Things Stack, an Open Source LoRaWAN Network Server
https://www.thethingsindustries.com/stack/
Apache License 2.0
957 stars 302 forks source link

redis: connection pool timeout #3799

Closed kurtmc closed 3 years ago

kurtmc commented 3 years ago

Summary

After some time running lorawan we will see that the network traffic to our redis instance drops off and a large number of errors being reported by the lorawan application. Restarting the lorawan application resolves the issue.

Steps to Reproduce

We can only reproduce this issue in our production environment, so I suspect you need to have a lot of traffic on lorawan to run into this issue.

lorawan-config.txt

What do you see now?

INFO Finished unary call                      duration=4.0002s error=error:pkg/networkserver:device_not_found (device not found) error_cause=error:pkg/redis:store (store error) error_cause_cause=redis: connection pool timeout error_correlation_id=24679b61b592424c878b8431762cbd3d error_name=device_not_found error_namespace=pkg/networkserver grpc.method=HandleUplink grpc.service=ttn.lorawan.v3.GsNs grpc_code=NotFound namespace=grpc peer.address=pipe request_id=01EYGW2NE7WAR03JD917NGNKRG

What do you want to see instead?

...

Environment

/ $ ttn-lw-cli version
The Things Network Command-line Interface: ttn-lw-cli
Version:             3.10.7
Build date:          2021-01-14T12:34:23Z
Git commit:          ecf52d6
Go version:          go1.15.6
OS/Arch:             linux/amd64
/ $ ttn-lw-stack version
The Things Stack for LoRaWAN: ttn-lw-stack
Version:             3.10.7
Build date:          2021-01-14T12:34:23Z
Git commit:          ecf52d6
Go version:          go1.15.6
OS/Arch:             linux/amd64

How do you propose to implement this?

I think there is a leak related to the redis connection pool. I am unsure how to fix it.

How do you propose to test this?

...

Can you do this yourself and submit a Pull Request?

...

johanstokking commented 3 years ago

I believe this is fixed with https://github.com/TheThingsNetwork/lorawan-stack/pull/3704 but it isn't part of a 3.10 release yet. We're releasing 3.10.10 today.

@rvolosatovs I triaged the issue but if you feel confident that this is resolved, please close.

virtualguy commented 3 years ago

We are still seeing this in 3.10.10

rvolosatovs commented 3 years ago

@adriansmares do you have any input here based on your research of go-redis library bugs? Can this be related?

adriansmares commented 3 years ago

@adriansmares do you have any input here based on your research of go-redis library bugs? Can this be related?

I've posted my findings here.

TLDR: Could be, but I cannot tell. The connection reaper bug could be old, and cause this issue (the one we're in right now), or it could be new and introduced as part of the v8.4.1 release, and then it's probably unrelated.

rvolosatovs commented 3 years ago

Possibly ref https://github.com/go-redis/redis/issues/1657

kurtmc commented 3 years ago

Looks like this is resolved in v3.11.2, we have been running for about 6 days and have not run into this redis issue.