Open Korg95 opened 2 weeks ago
Hi @Korg95 👋
Thanks for a very elaborate proposal, I really appreciate that. That being said, I have marked this as ClickHouse-specific as we do not (currently?) get many similar requests regarding other data sources.
I'm also thankful for the list of specific use cases. May you please provide an example of parameter setting for each one of them? Some minimal yet working SQL for each example would really help—I don't want to misunderstand what is really required, hence my ask.
My current understanding is that most probably the solution might along the lines of allowing to initialize a database driver in Cube with an additional SQL that is executed once per connection. Further customization would be possible via defining multiple data sources and using the right one in each cube.
Is your feature request related to a problem? Please describe.
In an example such as ClickHouse, there are many settings that can be applied at the server, session, or query level to optimize performance. However, the best setting for one query may not be ideal for others. Relying on server-level settings for query performance often leads to suboptimal results for certain queries, especially those with varying parameters. Many queries can benefit from runtime hints/settings to improve performance based on their specific needs (Telemetry might be another case).
Currently, Cube.js forces us to use SQL instead of table names in cube definitions as a workaround to apply settings. This causes subqueries to be formed where the setting is only applied to the subquery, rather than the entire query. As a result, the intended settings are not fully applied, leading to performance inefficiencies, particularly for real-time applications. There are also potentially databases where applying settings at the subquery level is not possible, making this workaround even more problematic.
Describe the solution you'd like
We would like Cube.js to support the ability to pass query hints or settings to the query builder, specific to a given driver. Ideally, these hints could be calculated dynamically during query_rewrite and then passed down to the query generation.
Additionally, allowing API callers to pass parameters that would control query hints would provide better control over query execution, especially when there is no direct customer exposure. For example setting of parameters may be calculated via a call to a library in CSharp which is our main language for example. Forcing individuals to use query_rewrite to do it all might be a showstopper in this case.
An alternative on this latter part to avoid this low level hinting would be to use schema to help schematize the options available for a given datastore to pass to the query generation layer. So when you query a cube linked to a datastore, a range of options would be available.
Describe alternatives you've considered
Using raw SQL instead of table names in cube definitions to allow for injection of settings using filter params. This workaround forces the query to become a subquery, limiting the scope of the applied settings to the subquery rather than the entire query. This not only leads to inefficiencies but may also not be possible in languages where settings cannot be applied to subqueries. Manually setting certain parameters at the server or session level, though this does not offer the flexibility needed for queries with varied workloads and priorities.
Another workaround is to ask cube for the raw SQL and then alter it as required and execute the query yourself. This is useful ofcourse for edge case scenarios where you might do things like alter the query and export the result to s3 but it totally defeats the purpose of cube for regular query scenarios.
Additional context Some specific use cases where query hints/settings would be beneficial:
Providing this flexibility would improve query execution performance, especially for real-time applications.
If there is existing functionality to do this, happy to move this to a question so that others can find the info there.
Thank you!