apache / druid

Apache Druid: a high performance real-time analytics database.
https://druid.apache.org/
Apache License 2.0
13.46k stars 3.7k forks source link

[Proposal][QTL] unify lookup dimension spec #3082

Closed b-slim closed 5 years ago

b-slim commented 8 years ago

Today we have 2 ways to create a druid dimension based on a lookup. The first one (old way) is as an extraction function ExtractionFn and the second is via lookup dimension spec LookupDimensionSpec. Chronologically speaking the LookupDimensionSpec was introduced after ExtractionDimensionSpec. The reason behind this is the fact that lookups can offer more functionalities like unApply, unApplyAll, bulk lookups or the fact that one dimension can map to multiple values use case. IMHO This means it is better for lookup to be independent from extraction function from a design perspective. Now that we ended up with 2 way of doing things, i guess it is time to unify it. Benefits.

To achieve that is to make filters accept dimension spec instead of just dimension as a string and extraction function. Then make the dimensionSpec use the decorate method since getExtractionFn is deprecated anyway. Also we have to make all the filters work with either decoration of selector or extraction function.

b-slim commented 8 years ago

@gianm and @drcrallen what do you think ?

gianm commented 8 years ago

One problem with LookupDimensionSpec that exists today is that the lookup can't be chained with any other extractions. Users can use the chained extraction fn to write filters like SUBSTR(LOOKUP(foo, "mylookup"), 2, 4) = "bar" but this is not possible with dimensionSpecs by themselves.

This would need to be addressed before LookupDimensionSpec is really able to replace extractionFns.

gianm commented 8 years ago

The other kind of nesting is also possible and also useful, i.e. both of,

b-slim commented 8 years ago

@gianm what you are saying make sense, but this mean if a lookup is embedded under extraction function, we can do something like mapping one to many or take advantage of applyAll. Not sure which way to go, do you have suggestions ?

gianm commented 8 years ago

@b-slim not sure what you mean by that, could you give an example of what you're thinking?

b-slim commented 8 years ago

@gianm For instance lookup can be mapping between one to many, how you want to support that if you do cascading ?

gianm commented 8 years ago

Currently lookups can not actually be one to many right?

But if they could, one possible way to deal with that is if you have a cascade, and layer N of the cascade generates potentially many values for every one value, then you handle that by running functions in layer N + 1 for each of those many values. So the cascade ends up returning potentially multiple values rather than just one value.

b-slim commented 8 years ago

@gianm yes, it is not supported and thought it is in the road map. I am not sure why you don't like dimensionSpec doing the decoration. We can have a dimensionSpec that does cascading anyway right ? this can be chained by decorating selectors.

gianm commented 8 years ago

@b-slim nothing's wrong with dimensionSpec in principle, I just wanted to point out that there are some things extractionFns can do that dimensionSpecs can't, which would need to be addressed before they can really be unified.

In particular, some that I can think of are,

gianm commented 8 years ago

2908, #3091 also seems related.

gianm commented 7 years ago

@b-slim are you still planning to drive this? Shall we move to 0.10.1?

b-slim commented 7 years ago

@gianm i think this need to reconcile with bunch of other PR and proposals as well will look into that ASAP.

b-slim commented 5 years ago

closing this since am not looking into it.