WallarooLabs / wally

Distributed Stream Processing
https://www.wallaroolabs.com
Apache License 2.0
1.48k stars 69 forks source link

Intermittent race in forwarding state during autoscale #3117

Open slfritchie opened 4 years ago

slfritchie commented 4 years ago

Is this a bug, feature request, or feedback?

Bug

What is the current behavior?

Intermittent race (?) in forwarding state to new workers during autoscale grow & shrink events. Odds are likely that this regression was introduced during recent changes to resilience-related barrier protocols.

What is the expected behavior?

No race.

What OS and version of Wallaroo are you using?

Ubuntu Bionic/18.04 LTS + Wallaroo @ commit 35d203881dfdb575bd0b036cb8c7157f140aea6f

Steps to reproduce?

See README.md in tarball at http://wallaroolabs-dev.s3.amazonaws.com/scott/count.tar.gz. Instructions include options for building & running a demonstration test via a VM or Docker.

The relevant test involves:

Occasionally, the count displays anomalies, e.g., the count drops to 0 or jumps up by more than 1. For example, in the output at https://gist.github.com/slfritchie/00af23a28fbe427610f00f097cd46fd5 shows anomalies at lines 64, 85, 127, 170, and 191.

At line 205, all of the counters are reset, then the cluster is slowly shrunk down to 1 worker. Anomalies continue, see lines 233, 277, and 321.