I am testing the scenario where a single data centre becomes unavailable and I want to see what the recovery behaviour is. However I have found the upon restarting the failed DC it does not catch up with changes that it has missed.
Here is how I'm doing my test:
docker stop dc2n1
Increment a counter in dc1n1
docker start dc2n1
After restarting dc2n1 the following is output in the logs and the line Buffering txn in {{'antidote@dc1n1', is repeated forever
2020-04-24T13:05:00.728870+00:00 notice:
disk_log: repairing "/antidote-data/data/data_antidote/0--0.LOG" ...
2020-04-24T13:05:00.756855+00:00 notice:
disk_log: repairing "/antidote-data/data/data_antidote/730750818665451459101842416358141509827966271488--730750818665451459101842416358141509827966271488.LOG" ...
2020-04-24T13:05:01.790063+00:00 info:
Starting heartbeat sender timers
2020-04-24T13:05:01.791126+00:00 notice:
Forgetting DC {'antidote@dc3n1',
{1587,733399,89049}}
2020-04-24T13:05:01.791356+00:00 notice:
Forgetting DC {'antidote@dc1n1',
{1587,733359,875651}}
2020-04-24T13:05:01.791527+00:00 info:
Observing DC {'antidote@dc3n1',
{1587,733399,89049}}
2020-04-24T13:05:07.827192+00:00 info:
Observing DC {'antidote@dc1n1',
{1587,733359,875651}}
2020-04-24T13:05:10.434336+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434412+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434554+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434580+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434671+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:10.434691+00:00 info:
Waiting for application antidote to start (0 seconds).
2020-04-24T13:05:13.522205+00:00 info:
Connected to DC {'antidote@dc3n1',
{1587,733399,89049}}
2020-04-24T13:05:13.553140+00:00 info:
Waiting for DC {'antidote@dc1n1',
{1587,733359,875651}}
2020-04-24T13:05:14.473092+00:00 info:
Whoops, lost message. New is 42, last was 27. Asking the remote DC {{'antidote@dc1n1',
{1587,
733359,
875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:14.474678+00:00 critical:
Received unexpected log_reader_resp messages in state normal Message []. State {inter_dc_sub_buf,
normal,
{{'antidote@dc2n1',
{1587,
733385,
215012}},
730750818665451459101842416358141509827966271488},
init,
{[],
[]},
true}
2020-04-24T13:05:14.553899+00:00 info:
Waiting for DC {'antidote@dc1n1',
{1587,733359,875651}}
2020-04-24T13:05:15.474089+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:15.554915+00:00 info:
Connected to DC {'antidote@dc1n1',
{1587,733359,875651}}
2020-04-24T13:05:15.615894+00:00 info:
Starting heartbeat sender timers
2020-04-24T13:05:15.616401+00:00 info:
application: antidote
started_at: 'antidote@dc2n1'
2020-04-24T13:05:15.686080+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686282+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686460+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686635+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686875+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:15.686993+00:00 info:
Wait complete for application antidote (5 seconds)
2020-04-24T13:05:16.474943+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:17.476105+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:18.476976+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:19.477951+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
2020-04-24T13:05:20.479036+00:00 info:
Buffering txn in {{'antidote@dc1n1',
{1587,733359,875651}},
730750818665451459101842416358141509827966271488}
I am doing some tests to see if antidote would be a suitable solution for us. I am running some tests with a setup based on this docker compose example https://github.com/AntidoteDB/docker-antidote/blob/master/compose-files/dc3n1/docker-compose.yml
I am testing the scenario where a single data centre becomes unavailable and I want to see what the recovery behaviour is. However I have found the upon restarting the failed DC it does not catch up with changes that it has missed.
Here is how I'm doing my test:
After restarting dc2n1 the following is output in the logs and the line
Buffering txn in {{'antidote@dc1n1',
is repeated forever