cube-js / cube

📊 Cube — Universal semantic layer platform for AI, BI, spreadsheets, and embedded analytics
https://cube.dev
Other
17.97k stars 1.78k forks source link

How to add dynamic sql based filters to a cube query for a jsonb postgresql column? #7009

Open batflarrow opened 1 year ago

batflarrow commented 1 year ago

Problem

I have a table that contains a json column. The column will have a structure something like the following. field_id can vary for different accounts and mostly change at runtime, so I cannot define a dimension in the cube.

{
  "field_id" :"value",
   "field_id": ["field_value_1","field_value_2"]
}

I want to run queries like the following

SELECT * FROM my_table
WHERE json_columnn->>'field_id' = queried_value;
SELECT *
FROM my_table
WHERE (json_columnn->'field_id') ?| array['queried_value_1', 'queried_value_2','queried_value_3']

Is there a simple way I can achieve this through cube? Is there a way I can specify a filter purely through sql when sending a query? I couldn't find this anywhere in the docs.

If I could generate a cube at runtime with the top level SQL query with these filters and rest of dimensions and filter being handled by cubejs, that would help too.

paveltiunov commented 1 year ago

@batflarrow Right way is to dynamically generate the data model. Please see https://cube.dev/docs/guides/recipes/data-modeling/entity-attribute-value.

batflarrow commented 1 year ago

@batflarrow Right way is to dynamically generate the data model. Please see https://cube.dev/docs/guides/recipes/data-modeling/entity-attribute-value.

Yeah but can we dynamically generate this model for different queries to cube? From what I could understand the data models can only be generated at the time of initialization of the cubejs service, and not for each query.

I can't seem to find a way to dynamically generate a model for each query. I can't generate a model at the time of intiializing the service, because I won't have the information related to all the possible field_ids of the json column at the time of the initialization of the cube service, but I'll have this information at the time of making the queries.

Also even after that, will I be able to add arbitrary SQL filters for the fields that I've added like the ones I've mentioned in the orignal question?

igorlukanin commented 1 year ago

@batflarrow Please check this section of the recipe: https://cube.dev/docs/guides/recipes/data-modeling/entity-attribute-value#dynamic-attributes

I can't generate a model at the time of intiializing the service, because I won't have the information related to all the possible field_ids of the json column at the time of the initialization of the cube service

That section specifically shows how that can be done.

batflarrow commented 1 year ago

@batflarrow Please check this section of the recipe: https://cube.dev/docs/guides/recipes/data-modeling/entity-attribute-value#dynamic-attributes

I can't generate a model at the time of intiializing the service, because I won't have the information related to all the possible field_ids of the json column at the time of the initialization of the cube service

That section specifically shows how that can be done.

Yeah, I had gone through this. From what I understood, this approach assumes that the statuses are not going to change during the runtime of the cubejs service, but in my case fieldIds can keep changing and the same fieldId might mean something else for different accounts. And all fieldIds will be different for different accounts.

Can I somehow invoke this asyncModule on API calls to my backend server? If yes then I can fetch the fieldIds for a particular account, create a cube schema at runtime for each API call, and eventually run a query using that cube.

vasishtt commented 8 months ago

Were you able to find a solution on this @batflarrow ?

batflarrow commented 8 months ago

Were you able to find a solution on this @batflarrow ?

Unfortunately, despite my efforts to implement the suggestions from my previous comment, I've had to acknowledge that Cubejs is designed primarily for static schemas. It appears there's no straightforward solution for managing dynamic schemas that evolve during runtime and vary among different users of our application, similar to the ones we have. Despite our best attempts, we were unable to find a direct workaround.

One potential workaround we've devised involves making our dynamic schema somewhat static. Essentially, we've proposed assuming that certain field_ids of specific types will consistently store particular kinds of data. If we can hardcode this relationship into Cubejs Schemas, theoretically, we could achieve what we want. However, we haven't yet implemented this solution in production.

igorlukanin commented 7 months ago

@batflarrow I understand that it would require a significant effort but here's what you can do:

batflarrow commented 7 months ago

@batflarrow I understand that it would require a significant effort but here's what you can do:

  • inspect your data for potential filter values with an external service
  • generate your data model using that external service
  • then use repository_factory and schema_version to re-load that data model

@igorlukanin I'll look into this. But wanted to check if it's possible to implement this for let's say every API call the the UI makes to my server to fetch the data. And can the data model be different for every API call?

igorlukanin commented 6 months ago

can the data model be different for every API call?

Well, this might be possible to some extent but this is definitely not a best practice; you would have a huge performance penalty for every query you make.

The right approach is to identify "groups of API calls" that would reuse the same data model. Each group would be a "tenant" in Cube's terms. And for every tenant you can have its own data model.