Closed niloc132 closed 1 year ago
I was able to reproduce this even without the partition_by. It always stemmed from a viewport against the end of a table and creating a sort on that table. After a lot of observation, I noted that every single invalid range wrt previous was valid wrt current. This led me to question how usePrev
is being set, and ultimately to io.deephaven.server.barrage.BarrageMessageProducer.SnapshotControl#usePreviousValues
where we find the bug. A recent refactoring of this logic in #3983 introduced this bug by updating a variable name away from being shaded, but didn't update the other references of it to the new variable name. This caused usePrevious to always be true for this specific code path. Most of the time this mis-use of usePrevious would be OK since the previous ring was up to date with the current ring; during exceptional times though (when previous ring != current ring), this could result in errors like the stacktrace we see above. In particular, row sets viewports at the very end of the ring table are more more likely to be invalid wrt previous... in other cases where the user is at the beginning or middle of a table, it is much more likely they will just receive stale data without error.
Ultimately, the web UI sort was just the catalyst to cause a new barrage producer snapshot and had no direct bearing on the cause of this issue (nor did the specific table operations on the query; save for the strictness check for ring tables that originally brought this to our attention).
I don't have an exact reproducer for this, but here's the rough idea that gets to the structure of the ring table being fed from the blink table:
The upstream table is better defined as a table publisher (i.e. a blink table) with contents from two sorted tables, merged into one table and published.