apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.52k stars 3.71k forks source link

projection segment merge fixes #17460

Closed clintropolis closed 1 week ago

clintropolis commented 2 weeks ago

changes:

Alternative to #17388, this does a nicer thing of persisting temporary files containing the id mapping buffers of parent mergers to re-use when merging projections, similar to what auto columns were doing with their dictionary.

I made a conservative but kind of odd change to DimensionHandler interface involving a default implementation of makeMerger that calls a now deprecated existing version of makeMerger to do a signature change to make this PR a bit less disruptive in the event we want to do a 31.1 patch release (i would like to do this to fix another issue with the 31 release, I will start a thread soonly). DimensionHandler is not officially an extension point, but is the mechanism which custom dimensions can be defined, so I wanted to be chill about this change. After this PR I will open another one to delete the deprecated thing and remove the default implementation. The interface was changed to add a parameter for the path to where we are writing out the segment in case a merger or serializer needs to create any temporary files, since prior to this the task define storage location was not directly available to the mergers, so this lets things write stuff to a more correct place.

While I was here I modified the 'auto' column merger to use this new segment write out path argument for its temp files instead of java.io.tmpDir.

Also fixes an issue with projections on a QueryableIndex to make them not put the time-like column as a dimension, instead only adding it as __time to ensure it doesn't get handled incorrectly.