cube-js / cube

πŸ“Š Cube β€” The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.73k stars 1.75k forks source link

Rest API - Filter using another dimension #8607

Closed cemhany closed 1 week ago

cemhany commented 3 weeks ago

Is it possible to create a filter that references another dimension?

Let's assume I have a cube with dimensions "orderActualDeliveryDate" and "orderEstimatedDeliveryDate". I want to filter rows where the first one is greater than the other to list late deliveries. Is this possible using the "filters" when making a query over RestAPI? If not, what would be a workaround?

Thanks a lot!

igorlukanin commented 3 weeks ago

Hi @cemhany πŸ‘‹

Is this possible using the "filters" when making a query over RestAPI?

Not quite. Since you can only reference one dimension in the filter and you should be comparing it with a literal value rather than another dimension.

If not, what would be a workaround?

You can define one more dimension, e.g., actualAndEstimatedDeliveryDateDifference or even isDeliveredLate, that references both other dimensions in its definition. Then, you can filter on that new dimension.

cemhany commented 3 weeks ago

Hey Igor,

I have created a new dimension with sql "order_estimated_delivery_date - order_delivered_customer_date". But one concern I have is interfering with SQL properties too much would cause us to lose the data source agnostic behaviour of the platform.

What I mean by that is, I would imagine using "-" operator for finding the difference between two time columns wont't work with every data source type. Perhaps for this simple example it would work with every data source. But in more complex scenarios we would need to create another layer for mapping different queries to some format that underlying data source understands. I am not saying that's unacceptable, on the contrary perhaps that would be a factor that adds value to the platform that uses cube.js in the background as semantic layer.

Am I right in my assumption with losing data source ignorant nature of simple cube models would be lost as custom (and complex) dimensions such as these are added? If yes would leveraging, for example, cubejs-schema-compiler help with the problem I meantioned above.

Sorry if this is not the place for such discussions, I can carry it to slack if you like.

Thanks again for the amazing package.

igorlukanin commented 3 weeks ago

@cemhany Thanks for a deep question!

data source agnostic behaviour of the platform

It looks like your use case is rather interesting. Are you planning to use Cube with Snowflake today, with ClickHouse tomorrow, and then migrate to Trino after trying Databricks for a week? πŸ˜„

Jokes aside, I'd like to understand your use case better.

Generally, there's nothing bad in having data source-specific SQL in your data models, especially in the SQL that cubes are defined over. If a case when you need to migrate to another data source, that would probably need to be rewritten manually or with some automation (we have LLMs now!) but that is usually not a big concern.

And if you're talking about using the same data model with different data sources at the same time, that would be an even more rare use case. Is this what you're talking about?

(If we take a bird's-eye view on this, if you want to abstract something out, that would need to happen at some level of abstraction. Sounds like solving this completely at the level of Cube's REST API querying syntax would effectively make that syntax as complex as a Turing-complete programming language. But then, we already have SQL and other languages, is it right to invent one more in that very place?)

cemhany commented 3 weeks ago

Hey again,

None of the above actually πŸ˜„

I think the assumption here is that the cube models are generated statically. But what I am trying to achieve is to create a platform (perhaps something similar to cube cloud in this sense) where I am able to connect any data source the cube.js supports. So cube models needs to be generated dynamically and programmatically similar to how playground generates models using scaffolding package. If I want to add a dimension that's not simply the table's name, things might quickly get messy because of the differences in SQL dialects.

If such a feature were to be added to REST API, I know that eventually another "ah I need this too" moment will come along where another feature needs to be added to REST API. Which in the end will come to the issue you mentioned. I agree that having a REST API that maps every capability of raw SQL queries to a JSON payload is not a feasible solution (if even possible).

All being said, I still think filtering using another dimension would be a very strong and helpful feature πŸ˜„

So I will most likely, sooner or later, need an abstraction layer that handles the differences in SQL dialects. What I am wondering right now is, like I mentioned in the previous comment, if schema compiler or any other package in repository would help me build this abstraction layer? Or perhaps, though I highly doubt, there is a subset of SQL dialects I can use that's supported by all data source in cube.js? Or another approach altogether I am not able to think of?

Thanks again for your response, I appreciate it :)

igorlukanin commented 2 weeks ago

All being said, I still think filtering using another dimension would be a very strong and helpful feature πŸ˜„

Well, we were focusing on the REST API here, however, you might take the SQL API and be able to do exactly thatβ€”today.

So I will most likely, sooner or later, need an abstraction layer that handles the differences in SQL dialects.

Given that you're looking to build a universal platform, this is the way to go.

Or perhaps, though I highly doubt, there is a subset of SQL dialects I can use that's supported by all data source in cube.js?

I think it's fair to say that most databases support SQL '99 or at least SQL '92. I guess you can do quick research to figure that out.