WallarooLabs / wally

Distributed Stream Processing
https://www.wallaroolabs.com
Apache License 2.0
1.48k stars 69 forks source link

ConnectorSink 2PC overhaul #3103

Closed slfritchie closed 4 years ago

slfritchie commented 4 years ago

Re-implements the ConnectorSink + 2-Phase Commit protocol via two new FSMs, the "external connection operations" FSM and the "checkpoint/rollback operations" FSM.

Partial fix for bug #3097 Fixes #3086 Fixes #3031 Perhaps addresses #2878 Fixes #2814

slfritchie commented 4 years ago

The reconstruct_input_producer_barrier_events() method in BarrierSinkPhase is bogus: it can reorder app msgs wrt barrier tokens ... and I've now witnessed that exact re-ordering happening, derp. I definitely need to rip it out, first thing tomorrow morning.

slfritchie commented 4 years ago

@jtfmumm This branch is near ready: I'm looking at an intermittent failure right now, but it has had a big overhaul of how the phase buffering is done. This PR replaces a lot of bad code that (alas for me) worked most of the time, so even if it isn't 100% perfect, it's far better than today's master branch. Unless there's something terrible lurking in here, I think I can work on the intermittent failures in separate bugs & PRs.

slfritchie commented 4 years ago

The unit test failure for commit ccbe96e is apparently due to the timer that I'd added to Step, ouch. I'll put an ifdef around the timer's setup so that normal unit test compilation won't have to see it.