apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.41k stars 3.68k forks source link

Improper result-level cache ETag handling for union datasources #8713

Open gianm opened 4 years ago

gianm commented 4 years ago

Result-level caching and union datasources do not play well together: it is possible for one of the underlying datasources to be improperly ignored.

Here is how result-level caching was designed to work. It is trying to piggyback off the system that implements the standard ETag / If-None-Match protocol. Note that there's a QueryRunner stack on the broker that, among other runners, goes ResultLevelCachingQueryRunnerUnionQueryRunnerCachingClusteredClient.

  1. ResultLevelCachingQueryRunner computes a cache key for the query using the toolChest's cache strategy. These cache keys do not generally include the datasource or interval. This sounds bad, but is actually fine, as we'll see.
  2. ResultLevelCachingQueryRunner fetches results from the cache for that key. Part of the cached value is an ETag based on the totality of segments involved in the query (it is a SHA1 of all of the segment identifiers). The idea is that the ETag includes all the datasource and interval information, so the cache key doesn't have to. It was computed by CachingClusteredClient the last time the query ran (see step 4).
  3. That ETag is set as If-None-Match in the query request context, and the query is passed down the QueryRunner stack.
  4. CachingClusteredClient, in its run method, computes the ETag for the query and saves it in the "ETag" field in the response context.
  5. CachingClusteredClient also checks if the ETag matches the If-None-Match query request context parameter. If so, it returns an empty sequence, and will not actually execute the query.
  6. ResultLevelCachingQueryRunner, after getting the sequence from downstream runners (but before evaluating it), inspects the query response context. If an ETag is set there, and it's the same as the ETag from the cache, then it knows the CachingClusteredClient noticed a match, and it returns the old cached results. If the ETag is set to something else, it instead returns the sequence from the downstream runners.

There is a problem here when union datasources come into play. UnionQueryRunner works by splitting up union datasources into N subqueries, then running them all and merging the results together. Crucially, it sits above CachingClusteredClient in the QueryRunner stack, and all subqueries share the same response context. Meaning that when CachingClusteredClient sets ETags in the response context, it's actually doing it for each subquery separately, and they're all clobbering each other.

That sounds bad enough! But there is another, more subtle problem as well. The ETag setting only happens when CachingClusteredClient's run method gets called. But, when a union datasource is split up, it transforms an eager run call into a lazy run call. This means the ETag actually doesn't get set until you start iterating the sequence, which hasn't happened yet by the time step (6) happens above. So the ETag is always null at this point, even if it will get set later on (probably incorrectly).

The effect of all this on ResultLevelCachingQueryRunner is that when one of the subquery ETags matches the currently-cached result, that subquery's results will be ignored (because of step 5) but other subquery results will be fetched and merged as normal and the ResultLevelCachingQueryRunner will think it had a cache miss.

himanshug commented 4 years ago

Thanks for the detailed information.

I think UnionDataSource should handle this special case which would solve the problem for both ResultLevelCache and for users with their own result level cache outside of Druid that are using same protocol.

UnionDataSource should have different ResponseContext objects for different queries and merge them into a single [maybe nested] ResponseContext object in the end. The merge process creates the top level Etag = serialize-to-string(array of ETag objects from each query response) Also, If UnionDataSource receives If-None-Match key in the incoming ResponseObject , it should assume that would be a serialized string from array, break it into elements and add to one per datasource query ResponseObject it creates. If If-None-Match value is not deserializable into string array of size N (number of datasources) then dont use it.

gianm commented 4 years ago

I think UnionDataSource should handle this special case which would solve the problem for both ResultLevelCache and for users with their own result level cache outside of Druid that are using same protocol.

I am worried about a situation where UnionQueryRunner needs to know about ETags. What if there's some other response context field now (or in the future) that has the same problem? It is too bug-prone for people to need to remember to update UnionQueryRunner every time they add a response context field.

Also, if there are things other than UnionQueryRunner that split up queries into subqueries, they would all need to know about ETags too, which seems weird.

What do you think about making ResponseContext have a concurrency-safe "compute" method similar to the one on java.util.Map? We could even get rid of the put method and only offer compute, to ensure we're always handling this case properly. Or keep put but make it throw an error if there is an existing value.

himanshug commented 4 years ago

sorry, been away for a while.

sounds good, if it can be handled in the more generic way at the level of ResponseContext .