Graylog2 / graylog2-server

Free and open log management
https://www.graylog.org
Other
7.33k stars 1.05k forks source link

Ability to have lookup table cache NOT CACHE null/empty values. #15200

Closed drewmiranda-gl closed 10 months ago

drewmiranda-gl commented 1 year ago

What?

This is a follow up of https://github.com/Graylog2/graylog2-server/issues/13579 .

Asking for the ability to have a graylog lookup table NOT CACHE empty/null values. Currently if a cache is used for a lookup table, the result is always cached, regardless of the result being valid or having contents. This creates a problem where unexplainable and ephemeral empty responses from the data adapter get stuck in cache and the lookup table is incorrectly serving an empty result when it could get a working result if it ignored the empty cached value.

Why?

Using a cache with a lookup table has a lot of practical performance and stability benefits when using lookup tables. However, there are scenarios, for whatever reason and without any clear explanation, where graylog will cache an empty value for a given lookup, despite a valid entry being available from the data adapter. When this empty value is cached, it will continue to be returned until the cache's "expire after" is reached.

When using "Expire after access", the empty value can potentially live on for a long time. For example, if the lookup key is accessed once per minute and the "Expire after access" value is 5 minutes (which is the default i believe), that empty value will continue being served until no log messages reference that lookup key for 5 minutes.

Currently the only workarounds for this are:

Your Environment

Please let me know if there are any questions and if there is any testing i can do or help with.

patrickmann commented 1 year ago

@drewmiranda-gl Trying to understand if you are asking for a configuration setting, or an improvement in the behavior of caches. Is there a case where we legitimately return empty (whatever that means)? Null should never be a cached value, as this response indicates that there is no value for a given key.

This behavior might also be the result of a race condition in the async CaffeineCache implementation. Are you able to reproduce the issue with any consistency?

drewmiranda-gl commented 1 year ago

Good question. I'm not sure on what is considered expected behavior here. If it were a configuration setting, my thinking is that it can be per cache. I read through that linked issue and it doesn't exactly match what i'm observing. The empty/null value is cached basically forever (until the expire after threshold is satisfied).

I believe this can be reliably reproduced, although i have not attempted recently. The steps to reproduce are documented in #13579 .

Here is a quick video demonstration. Let me know if you have any questions! http://videos.graylog.com/watch/CaieRcVWuzTWL8oRgvM7kX

patrickmann commented 10 months ago

https://github.com/Graylog2/graylog2-server/issues/16199 introduced a new setting to block caching of null (off by default). User should be able to set this from the UI when defining an in-memory cache.