Open slfritchie opened 4 years ago
Bug
Intermittent race (?) in forwarding state to new workers during autoscale grow & shrink events. Odds are likely that this regression was introduced during recent changes to resilience-related barrier protocols.
No race.
Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d203881dfdb575bd0b036cb8c7157f140aea6f
See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.
The relevant test involves:
tail
Occasionally, the count displays anomalies, e.g., the count drops to 0 or jumps up by more than 1. For example, in the output at https://gist.github.com/slfritchie/00af23a28fbe427610f00f097cd46fd5 shows anomalies at lines 64, 85, 127, 170, and 191.
count
At line 205, all of the counters are reset, then the cluster is slowly shrunk down to 1 worker. Anomalies continue, see lines 233, 277, and 321.
Is this a bug, feature request, or feedback?
Bug
What is the current behavior?
Intermittent race (?) in forwarding state to new workers during autoscale grow & shrink events. Odds are likely that this regression was introduced during recent changes to resilience-related barrier protocols.
What is the expected behavior?
No race.
What OS and version of Wallaroo are you using?
Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d203881dfdb575bd0b036cb8c7157f140aea6f
Steps to reproduce?
See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.
The relevant test involves:
tail
to watch the output of the app's sinks.tail
shows the count of previous times that input keys have been seen.Occasionally, the
count
displays anomalies, e.g., the count drops to 0 or jumps up by more than 1. For example, in the output at https://gist.github.com/slfritchie/00af23a28fbe427610f00f097cd46fd5 shows anomalies at lines 64, 85, 127, 170, and 191.At line 205, all of the counters are reset, then the cluster is slowly shrunk down to 1 worker. Anomalies continue, see lines 233, 277, and 321.