apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

Do not Load lookups on tasks that will not use them #13324

Open zachjsh opened 1 year ago

zachjsh commented 1 year ago

Description

At the moment, all druid tasks types attempt to load lookups, even when not needed:

  1. Some task types, have absolutely no need to do this, such as compaction, or kill tasks
  2. native batch ingest: only load lookups that are actually needed by transformSpec or metricsSpec
  3. MSQ: only load lookups that are actually needed by the query

Motivation

Lookups can be large. and should not consume resources on tasks that have no use for them. It's a waste of resources.

gianm commented 1 year ago

Seems like a good idea to me. On the MSQ front, note that worker tasks don't know what lookups they're going to need until they get a work order, so we'll need some way for them to be loaded on demand after a task starts up.

IgorBerman commented 6 months ago

Any idea when this might be pushed to roadmap? At least disable lookups loads for compaction tasks. I believe compaction tasks might be improved using higher value for maxRowsInMemory but currently the memory might be occupied by lookups(that compaction doesn't need) I believe if we will scope this only for compaction and kill tasks (i.e. item 1) there wont be any need for design or discussion