StarRocks / starrocks

The world's fastest open query engine for sub-second analytics both on and off the data lakehouse. With the flexibility to support nearly any scenario, StarRocks provides best-in-class performance for multi-dimensional analytics, real-time analytics, and ad-hoc queries. A Linux Foundation project.
https://starrocks.io
Apache License 2.0
9.03k stars 1.82k forks source link

Starrocks caching strategies seem flawed for regular duplicate queries #52764

Open fuzing opened 1 week ago

fuzing commented 1 week ago

Enhancement

We're using Iceberg/S3 as a backing store, and looking at Starrocks for compute (i.e. ingestion into iceberg outside of SR, but using a SR external iceberg catalog for read-only compute/query).

It appears that SR is utilizing caching at multiple levels (i.e. at the iceberg level, with metadata caching, and then again with query caching). This is fine, except........

It also appears that SR has opted for LRU caching at one/both these levels, with no absolute time-based eviction mechanism (i.e. if you keep hitting SR with the same query then the same data is returned, the eviction timer is reset, and the query result is NEVER evicted).

Our use-case is click-stream data, and our users are looking for near-real-time insights into their website traffic (meaning they'll keep trying to refresh their view into their data at short intervals) - think page-refreshing in google analytics.

To simulate this, if we query the catalog at relatively short intervals (say every 15-30 seconds), with something simple like:

SELECT * from <table-name> WHERE <some-conditions>;

then the cached version of the result is NEVER evicted. Pure LRU caching is generally not a great idea for DB applications unless there's also some way to evict entries after some (configurable) absolute time from when the cached slot is initially populated.

Unfortunately, unless there's a mechanism to override this (without simply turning caching off, which would kill performance), then this makes SR a non-starter for ours and a multitude of similar use-cases.

It would be great to see an enhancement to SR's caching that adds absolute-time-based eviction "from FIRST query".

eshishki commented 1 week ago

can be fixed by https://github.com/StarRocks/starrocks/pull/47948

fuzing commented 1 week ago

@eshishki - I'm not sure that this is correct, as that PR seems to only address iceberg level caching (not the SR query cache level which appears to use the same caching technique).

On a separate note - your PR was submitted 4 months ago - is there an ETA on how long this will take to be merged?? Fixing this would seem to me to be high priority, as there are numerous use-cases that SR/Celerdata is targeting that simply won't work unless this is rectified.

eshishki commented 1 week ago

if your problem is fixed by refresh external table then you are hitting the same problem as me when SR caches iceberg metadata and no longer refreshes it and you don't see new commits to the data

fuzing commented 1 week ago

@eshishki - The query cache seems to work the same way, and I'm presuming that's another independent cache level that exists on top of iceberg metadata caching, meaning just addressing the iceberg metadata cache won't fix the problem by itself (i.e. query cache will need the same treatment). Also, looking at the Iceberg codebase (Java code at least) - the same problem exists there as well. Their underlying metadata caching mechanism is identical in nature (i.e. LRU with time from last query eviction - no absolute-time-based eviction - for the iceberg side I'm talking about the io.manifest.cache-enabled and related parameters)...... so that might need to be addressed as well.

In terms of Starrocks, that's why I opened this enhancement request in addition to your original PR, which I was already aware of.

fuzing commented 1 week ago

@eshishki - To clarify on my above comment vis. the code-base over at Iceberg - I'm talking about the REST catalog implementation (which I'm using both for ingest and also for SR iceberg catalog). It implements caching (defaults to 30000 ms timeout), but with the same logic as used here within SR (i.e. LRU with additional timeout from last query). When using SR for compute/query the solution is to configure Iceberg REST metadata/manifest caching to "off", but I'm thinking they should also fix the underlying problem over there as well. Hope this makes sense.

eshishki commented 1 week ago

it doesn't matter that you use rest catalog and me using glue catalog, the iceberg caching mechanism gets latest snapshot from catalog, caches it and later have trouble to refresh it.

do you see fresh data after refresh external table?

fuzing commented 1 week ago

@eshishki - I think you may be missing the point. There are multiple layers of caching on the SR side, including external catalog metadata caching, plus query caching. The results of querying an external catalog are ultimately passed up to the query cache, so that identical queries can be serviced/read from this cache. This is unrelated to caching over at the catalog itself (i.e. outside SR), but that outside caching will play into the equation (we have metadata/manifest caching on the iceberg REST side turned "off").

To be clear, for the purposes of this discussion we're only talking about what's happening on the SR side of the equation, and my understanding is that there are 2 levels of cache (at least)...... the metadata cache AND the query cache.

If both the external catalog metadata cache AND the query cache are using the same caching logic then fixing one without the other won't necessarily help the situation.

It is also true that it is likely very different depending upon the source/path of new data entering the external table. I suspect (and would hope) that SR will selectively invalidate cache slots (or update them) when the external table is being updated by/through SR itself. Conversely, if the external catalog/table is being updated outside of SR then the updates are only picked up by SR when query AND metadata cache entries are evicted on the SR side.

To answer your question, do I see fresh data after refreshing, yes and no. This issue was opened relating to "caching in general" - not specifically to "refresh external table". If one reads data repeatedly using shorter intervals than SR's metadata cache and query cache timeouts (the longer of the two) then the data is NEVER updated. If one does not query the external catalog for an interval greater than the cache timeout from last query, then new data is surfaced. That said, my issue is not necessarily related to: refresh external table despite you continually insisting it is. I'm also not convinced that "REFRESH EXTERNAL TABLE" is the correct solution here anyway, as it requires "ALTER TABLE" privileges. Having those level privileges to ensure that fresh data is surfaced in a predictable way seems like the wrong solution. BTW Per the official SR docs, "refresh external table" relates to Hudi and Hive catalogs/tables. There is no indication this also applies to REST/Iceberg (although it may). Your PR will not fix my issue, despite you repeatedly insisting that it will.

In our use case, the iceberg data is being updated outside of SR. My contention is that one or more of the caches within SR is using LRU caching plus an eviction policy that relies on data expiring "after the LAST identical query". The caches should also implement a configurable timeout that evicts entries based on "absolute time from the FIRST identical query", otherwise data can remain perpetually stale.

My point about iceberg caching (i.e. outside of SR) is that they are using the same strategy (for REST at least) - i.e. an LRU cache that does not refresh unless identical queries are timed out (they also rely on expunging data based on time since last identical query, without taking time from first query into account). This will impact SR's view of the world, and may be a source of conflict, but again, that's a separate issue over at iceberg. For our application we have iceberg's own metadata/manifest caching turned "off" to prevent this. I haven't looked into glue catalog or any of the others to see their behavior, so I cannot comment on those.

Finally, without testing this (i.e. I'm speculating), the situation may be far worse than described above, because the SR metadata caching (i.e. on SR side) likely is not being updated for "any query" that resets the metadata cache eviction timer/timeout - after all, the SR's metadata cache only cares about metadata, which all queries rely on. I also haven't complicated this discussion by talking about cache eviction based on cache memory exhaustion - but clearly this will create yet another layer/level of non-determinism.

fuzing commented 1 week ago

I'm also wondering if someone like @ss892714028 might weigh in on this?