Open jcapricebasho opened 10 years ago
The most likely cause for negative acks to happen is if skips are getting out of sync with the cseq (last sent sequence number) of a consumer. Under normal circumstances, this can't happen, but there is one place where the cseq (and aseq (last acked sequence number) is changed but skips is not): https://github.com/basho/riak_repl/blob/develop/src/riak_repl2_rtq.erl#L665
That is in the trim queue function. If the queue is trimmed beyond the cseq of a consumer, that consumer has it's cseq and aseq pulled to the queue's smallest sequence number. This means that the above calculation becomes 0 - 0 - skips. Looks like there were a ton of skips.
Resolution for this issue should probably have 2 things:
Moving to 2.0.1.
Moving to 2.1.
In the riak-repl status below, the number of unacked messages from CA to XV is negative. This was observed in Riak Enterprise 1.4.2 that includes the patched beams from https://github.com/basho/internal_wiki/wiki/Riak-1.4.2-release---beams-for-openx, which include the {CSeq - ASeq - Skips} change that addressed another cause for unacked going into the negatives.