ava-innersource / Liquid-Application-Framework-1.0-deprecated

Liquid is a framework to speed up the development of microservices
MIT License
25 stars 14 forks source link

AzureBlob.HealthCheck is broken #165

Closed bruno-brant closed 4 years ago

bruno-brant commented 4 years ago

The goal of AzureBlob's method HealthCheck() is to obtain a status code indicating whether the dependant blob service is working correctly.

The current approach is to get a lease on the whole container for one second and then break the lease. I think the mindset was, if those methods don't throw, then it's working:

https://github.com/Avanade/Liquid-Application-Framework/blob/d3c19c13217053a7fa2ac0072fe9330b626e3a76/src/Liquid.OnAzure/Storages/AzureBlob.cs#L122-L136

Very unfortunately, everything is wrong with this code.

The approach (lease)

The approach is to acquire a lease and then break the lease. A lease is a lock: once acquired, no other connection can write to the container, meaning, other people using the container may fail because of the lease. One second is a large time in a concurrent scenarios - how many times per second do we want people uploading files on the peak?

But let alone other people using the container, there's the big issue of concurrent health checks. On highly available architectures such as the ones where Liquid is supposed to follow, the code should have more than one instance. And every single instance of the code is looking at the same container, attempting the same lease.

This means that a instance of the application passes, but other fails.

Lease of 1 second aren't possible

Accordingly to MSFT docs:

The Lease Blob operation creates and manages a lock on a blob for write and delete operations. The lock duration can be 15 to 60 seconds, or can be infinite. In versions prior to 2012-02-12, the lock duration is 60 seconds.

So acquire lease is always failing:

image

Methods aren't awaited

Both AcquireLeaseAsync and BreakLeaseAsync are async methods, so they need to be awaited, otherwise they won't throw on call and therefore, even if the underlying service is out, no exception would be thrown.

This also means that the code is calling break right after acquired was called. Depending on the implementation, this would even result on acquire running after break.

Also, 'IWorkBenchHealthCheck.HealthCheck' isn't async, so we will need to block the calling thread by calling ´.Wait()´.

Exceptions aren't being logged

The catch method isn't logging the exception. This means that, if the health checker decides to kill the application, we get no info on what happened, and so won't be able to fix the mistake.

bruno-brant commented 4 years ago

Of course, there are plenty of better ways to perform what was intended above, like calling ExistsAsync. The fact that IWorkBenchHealthCheck.HealthCheck isn't async would still mean that the method would be blocking.