gchq / stroom

Stroom is a highly scalable data storage, processing and analysis platform.
https://gchq.github.io/stroom-docs/
Apache License 2.0
431 stars 55 forks source link

Optimise ref data lookups for entry type (all keys, all ranges, mixed) #2488

Open at055612 opened 2 years ago

at055612 commented 2 years ago

If the ref data load recorded whether the data was made up of all key value entries, all range value entries or a mix and store this in the processing info table, we could then optimise the lookups. Assuming that we have to query the proc info table anyway then this will remove the pointless hit on the kv store for range lookups and remove the pointless lookup on the range store for kv entries that are not found.

Ideally we need some kind of stateful (for the life of the pipeline process) version of the RefDataStore that can establish and hold info about the loaded data on heap for faster checks on each lookup.

at055612 commented 2 years ago

If we create a statefull (pipe scoped) RefDataStore that is held by one thread then we could hold a readtxn object on it and just reset/renew it all the time rather than ending txns all the time.