DataJunction / dj

A metrics platform.
http://datajunction.io
MIT License
29 stars 13 forks source link

SQL Query rewriting logic using MVs #1068

Open bazooka720 opened 1 week ago

bazooka720 commented 1 week ago

One of the features of semantic layer like cube.js has is that it has ability to rewrite query to identify whether to use pre-age (MV) tables vs. go to source or some combination by studying the query dimensions.

Curious, if such a feature exists or planned? In addition, a feature to use history to identify new TVs to create automatically would be super helpful as well

shangyian commented 1 week ago

We have a similar feature around using a materialized table (for a cube) if one is available instead of rerunning the underlying query generated for the cube. Is this what you're describing here? (I'm assuming MV = materialized view)

However, we don't yet have the ability to automatically create new materialized tables based on usage.

bazooka720 commented 1 week ago

Yes. Basically if DJ finds a MV (materialized view based on query pattern), database query is rewritten to become a multi-stage process: DJ checks if an up-to-date copy of the MV exists.DJ will then execute a query against the MV tables instead of the raw data. If the query requires / can be fulfilled partly from materialized and partly from source, it can do it as well (Federated as Trino Supports it).

In general, can you point to various functionality that DJ has as it relates to query rewrite and automation relating to it? In the doc you indicate following functionality but would appreciate more details:

Discover New Dimensions️ Find the metrics you’re interested in and instantly discover the dimensions that are common among them.

Understand Dependencies DJ understands the dependencies for metrics across the entire system using a robust metadata layer.

Prevent Broken Metrics Develop quickly and let DJ highlight the impact each change will have on all metrics.

Thanks!

On Thu, Jun 20, 2024 at 11:09 AM Yian @.***> wrote:

We have a similar feature around using a materialized table (for a cube) if one is available instead of rerunning the underlying query generated for the cube. Is this what you're describing here? (I'm assuming MV = materialized view)

However, we don't yet have the ability to automatically create new materialized tables based on usage.

— Reply to this email directly, view it on GitHub https://github.com/DataJunction/dj/issues/1068#issuecomment-2180941375, or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVA56SJX2YCI4PDJCZJ7RDZILWBNAVCNFSM6AAAAABJTADMS2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOBQHE2DCMZXGU . You are receiving this because you authored the thread.Message ID: @.***>

shangyian commented 1 week ago

Yes. Basically if DJ finds a MV (materialized view based on query pattern), database query is rewritten to become a multi-stage process: DJ checks if an up-to-date copy of the MV exists.DJ will then execute a query against the MV tables instead of the raw data.

Yep, our query generation process does exactly this.

If the query requires / can be fulfilled partly from materialized and partly from source, it can do it as well (Federated as Trino Supports it).

We don't do this part just yet.

In general, can you point to various functionality that DJ has as it relates to query rewrite and automation relating to it? In the doc you indicate following functionality but would appreciate more details:

Yeah, we need to update the docs, but we've been in a rapid development phase and haven't been as focused on the docs. Will get to this in the next few weeks!