PlayFab / consuldotnet

.NET API for Consul (http://www.consul.io/)
Apache License 2.0
692 stars 193 forks source link

Bad request : Invalid wait time and multiple Get request while trying to acquire leader #140

Open yevgenyfst opened 5 years ago

yevgenyfst commented 5 years ago

Hi, We are using this package (version 0.7.2.6) for Leader election using distributed lock. It is based on blog : https://codeblab.herokuapp.com/high-availability-and-leader-election-with-consul-c-part-1/ It works pretty fine (thanks for great work).

We have multiple servers with our application (which using playfab nuget) + with consul.io agent on the same server and additional cluster of consul.io servers for high availability Anyways, once in a few days we getting exception on Non leader application: Consul.ConsulRequestException: Unexpected response, status code BadRequest: Invalid wait time

Also, I see in consul agent logs: 2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=1.999s (1.9967478s) from=127.0.0.1:56777 2019/03/12 04:54:08 [DEBUG] http: Request GET /v1/kv/service/scm/leader?index=934475&wait=ms (0s) from=127.0.0.1:56777

Usually we try to acquire lock every 2 seconds, so we see every to seconds GET log with wait=...s.. But when problem happens we see 2 GETs when second has wait=ms, which causes that exception. I checked playfab code, public async Task Acquire(CancellationToken ct) method. but I didn't find good explanation why it happens? Some rare race condition?

10x in advance

highlyunavailable commented 5 years ago

It looks like the timeout there is getting formatted as ms, probably a bug in the code that converts a timespan to a go duration string. Check out https://github.com/PlayFab/consuldotnet/blob/master/Consul/Utilities/Extensions.cs if you want to take a crack at fixing it, but otherwise I'll look at it when I can.

yevgenyfst commented 5 years ago

Actually the best fix is in code: class Lock public async Task Acquire(CancellationToken ct) Instead of if (elapsed > qOpts.WaitTime) it should be if (elapsed >= qOpts.WaitTime) actually it looks like the only place where qOpts.WaitTime is updated and could be 0.

So i would like to do pull request with that fix