We have multiple servers with our application (which using playfab nuget) + with consul.io agent on the same server and additional cluster of consul.io servers for high availability
Anyways, once in a few days we getting exception on Non leader application:
Consul.ConsulRequestException: Unexpected response, status code BadRequest: Invalid wait time
Also,
I see in consul agent logs:
2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=1.999s (1.9967478s) from=127.0.0.1:56777
2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=ms (0s) from=127.0.0.1:56777
Usually we try to acquire lock every 2 seconds, so we see every to seconds GET log with wait=...s..
But when problem happens we see 2 GETs when second has wait=ms, which causes that exception.
I checked playfab code, public async Task Acquire(CancellationToken ct) method. but I didn't find good explanation why it happens? Some rare race condition?
Actually the best fix is in code:
class Lock
public async Task Acquire(CancellationToken ct)
Instead of if (elapsed > qOpts.WaitTime)
it should be if (elapsed >= qOpts.WaitTime)
actually it looks like the only place where qOpts.WaitTime is updated and could be 0.
Hi, We are using this package (version 0.7.2.6) for Leader election using distributed lock. It is based on blog : https://codeblab.herokuapp.com/high-availability-and-leader-election-with-consul-c-part-1/ It works pretty fine (thanks for great work).
We have multiple servers with our application (which using playfab nuget) + with consul.io agent on the same server and additional cluster of consul.io servers for high availability Anyways, once in a few days we getting exception on Non leader application: Consul.ConsulRequestException: Unexpected response, status code BadRequest: Invalid wait time
Also, I see in consul agent logs: 2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=1.999s (1.9967478s) from=127.0.0.1:56777 2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=ms (0s) from=127.0.0.1:56777
Usually we try to acquire lock every 2 seconds, so we see every to seconds GET log with wait=...s.. But when problem happens we see 2 GETs when second has wait=ms, which causes that exception. I checked playfab code, public async Task Acquire(CancellationToken ct) method. but I didn't find good explanation why it happens? Some rare race condition?
10x in advance