cloudfoundry / cloud_controller_ng

Cloud Foundry Cloud Controller
Apache License 2.0
193 stars 359 forks source link

Ability to do optimistic locking for resource metadata #3970

Open schulzh opened 1 month ago

schulzh commented 1 month ago

Issue

When updating metadata on CF API resources, its not possible to do optimistic locking. This results in race conditions when updating metadata from multiple clients concurrently.

Context

We would like to use metadata for storing some information on CF API resources, such as routes. The information will be updated by multiple microservice instances which do not have the ability to do pessimistic locking by themselves. When reading and updating the metadata, we run into classic race conditions where one service overwrites the data of another. The principle is outlined nicely in this article: https://sookocheff.com/post/api/optimistic-locking-in-a-rest-api/

Our current goal is to implement usage tracking on resources using the metadata, such that we can ensure a resource is only deleted when all usages are removed. In this scenario its crucial to avoid race conditions to ensure that no usage is added once the decision to delete a resource is made.

It should be possible for a client to ensure that the metadata has not changed since it has fetched it from the API when submitting an update. Ideally, this would extend to more operations than just updating the metadata, such as deleting (e.g. only delete a resource when the metadata is in the expected state).

Possible Fix

One possible solution is to add a mechanism similar to etags to the metadata, that changes on every metadata update. A client can then supply an If-Match header containing that value and the CloudController would only update the metadata when the etag value matches, otherwise it would return an error. This would optimistically lock the whole metadata at once.

A more fine-grained but also more complex solution would be to pass an expected key-value pair in this If-Match header (possibly even multiple pairs in multiple headers) that need to match for the request to succeed. This way its possible to lock only a part of the metadata, allowing updates to unrelated metadata unaffected.

philippthun commented 1 month ago

I see that you have an issue with parallel updates of metadata. In case we would start supporting If-Match and If-None-Match headers, this would mean that we need to define an ETag for all resources. I would not implement some non-standard use of these headers as you outlined in your last paragraph.

The problem with providing an ETag for resources would be to define what to include in the ETag generation (i.e. what is a "version"). As metadata cannot be addressed separately, the ETag for e.g. a route must comprise the route, its mappings (destinations), labels and annotations.

Without having more details about your use case, it sounds to me that the problem could be worked around by having multiple labels/annotations and each microservice instance updates a dedicated key. This would avoid race conditions. And to decide if a resource is still needed, one would evaluate all the labels/annotations.

schulzh commented 1 month ago

I would not implement some non-standard use of these headers

The headers don't need to be the If-Match and If-None-Match headers; this could also be a custom mechanism with different headers. However, adhering to the standard mechanism is a good thing in my opinion, too.

As metadata cannot be addressed separately, the ETag for e.g. a route must comprise the route, its mappings (destinations), labels and annotations.

I think that is fine. In our usecase we won't be changing the route so it will not make a difference for us anyway. In general, this would also allow other usecases where the processing can also take the rest of the resource state into account, so I think this can also be a good thing.

Without having more details about your use case, it sounds to me that the problem could be worked around by having multiple labels/annotations and each microservice instance updates a dedicated key. This would avoid race conditions. And to decide if a resource is still needed, one would evaluate all the labels/annotations.

Its this evaluation that poses the problem. When one microservice reads the resource to evalute its metadata, another one can then write to the metadata, which would affect the evaluation. There is no method to atomically check and write/execute this evaluation, so one can never be sure that the evaluation is up to date and correct.