elastic / ml-cpp

Machine learning C++ code
Other
149 stars 62 forks source link

[ML] Call readvalue even if cache write fails #2576

Closed davidkyle closed 11 months ago

davidkyle commented 11 months ago

Cache writes can time out acquiring the write lock, in this situation the readValue function should still be called even though the computed value has not been added to the cache.

The failure to call readValue under a contended lock causes the pytorch_inference process to loose the request and not respond to it. This in turn leaves Elasticsearch waiting for a response causing processing to hang.

CCompressedLfuCache::lookup returns true if the value was read from the cache, that is not of interest the way it is used in pytorch_inference so there is no need to check the return value. The general contract for use by pytorch_inference is that if computeValue returns a null or nullopt then the caller must handle the request response otherwise it can be left to the readValue function.