StackExchange / StackExchange.Redis

General purpose redis client
https://stackexchange.github.io/StackExchange.Redis/
Other
5.84k stars 1.5k forks source link

IndexOutOfRangeException while trying to consume a message from Redis #2688

Open ohadab29 opened 3 months ago

ohadab29 commented 3 months ago

Hi all,

we are using Redis 6.2.6 with 1 shard and 2 nodes. we use stack exchange version 2.7.20 in the last few days we see that we can not consume the messages and fail on : System.IndexOutOfRangeException: Index was outside the bounds of the array. at Pipelines.Sockets.Unofficial.Internal.Throw.IndexOutOfRange() in //src/Pipelines.Sockets.Unofficial/Internal/Throw.cs:line 65 at Pipelines.Sockets.Unofficial.Arenas.Sequence`1.GetReference(Int64 index) in //src/Pipelines.Sockets.Unofficial/Arenas/Sequence.cs:line 327 at StackExchange.Redis.ResultProcessor.SingleStreamProcessor.SetResultCore(PhysicalConnection connection, Message message, RawResult& result) in //src/StackExchange.Redis/ResultProcessor.cs:line 1854 at StackExchange.Redis.ResultProcessor.SetResult(PhysicalConnection connection, Message message, RawResult& result) in //src/StackExchange.Redis/ResultProcessor.cs:line 219 at StackExchange.Redis.Message.ComputeResult(PhysicalConnection connection, RawResult& result) in /_/src/StackExchange.Redis/Message.cs:line 533

after service restart it looks ok...any idea why ?

ohadab29 commented 3 months ago

someone?

mgravell commented 3 months ago

Hi @ohadab29 ; a: it's the Easter weekend, so a lot of folks are "out" (4-day weekend where I am), and b: we don't have an SLA here - I'd love to be able to respond promptly to every question, but: there simply isn't enough time in the day, and this isn't my day job (or at least, not a major part of it - I can scrape some time)

With the specific question: "I don't know" - definitely looks like something got confused somewhere in the result processor. That's bad, and a bug - totally agree; however, I'm in the process of completely overhauling those internals (literally removing everything shown in that stack trace), so unless it is something that we can reliably repro on-demand and thus readily investigate, my default action here would be "it'll probably be fixed indirectly when that lands". We've had a number of problems with the current internals, and the work I'm doing is in part a direct response to those instabilities (it also massively increases throughput, which is nice).

So: do you have a reliable repro?