"flush" wal_pos reported in keepalive unless we have pending transactions

The server may generate WAL that doesn't correspond to a transaction commit, e.g.:

automatic checkpoints driven by checkpoint_timeout
transactions that roll back

Currently we only update our "flush lsn" when we see a COMMIT, so if enough of this non-committed-transaction activity occurs before we see a COMMIT, the server may run out of WAL space and crash. I believe this is the root cause of #68.

The server's regular keepalive messages are already informing us of the updated wal_pos resulting from this activity. So this commit makes us acknowledge it as "flushed" (even though we didn't actually do anything about it), provided that we aren't currently waiting for Kafka to acknowledge a transaction in flight (in which case acknowledging the subsequent activity would could result in us losing the transaction in flight, in case of a restart).

I've also tested the case where Bottled Water restarts before the transaction is committed, and it does pick up from the right point and capture the whole transaction.

In the process of debugging this I ran into a bug in the client's logic for keeping track of transactions in flight, where the client would acknowledge each commit twice (each time it processed a new COMMIT, it would first reprocess the COMMIT of the previous transaction). I think the bug was harmless, but it made the logic hard to reason about, since effectively it was processing records since "removed" from the circular buffer abstraction. Since this change allows keepalives during that interval to update the fsync_lsn, it would also trigger the "Commits not in WAL order!" warning each time a transaction came in.

To work around that, I also refactored the circular buffer of transactions to remove some special cases in the handling of the initial snapshot transaction and distinguish more clearly between the buffer being empty and full.

I initially planned to write automated tests for this behaviour by querying pg_replication_slots.restart_lsn; however, the very async nature of the communication made it difficult to make the tests pass without large (20 second) sleeps - Bottled Water only checkpoints every 10 seconds, Postgres only issues keepalives once per checkpoint_timeout, etc. So right now there is no additional test coverage. It passes the existing tests, though :)

confluentinc / bottledwater-pg

"flush" wal_pos reported in keepalive unless we have pending transactions #73