cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.75k stars 1.75k forks source link

Allow Data Blending with matching dateRange and empty `granularities` #2579

Open rongfengliang opened 3 years ago

rongfengliang commented 3 years ago

Problem

from docs && source code i known cube.js support Data Blending .but current has one limit must contains granularities(gateway source code ) i think this maybe not necessary. offen we need run multi query with one request. if delete this limit query maybe more flexible. (delete gateway code from 380-389)

protected async getNormalizedQueries(query, context: RequestContext): Promise<any> {
    query = this.parseQueryParam(query);
    let queryType = QUERY_TYPE.REGULAR_QUERY;

    if (!Array.isArray(query)) {
      query = this.compareDateRangeTransformer(query);
      if (Array.isArray(query)) {
        queryType = QUERY_TYPE.COMPARE_DATE_RANGE_QUERY;
      }
    } else {
      queryType = QUERY_TYPE.BLENDING_QUERY;
    }

    const queries = Array.isArray(query) ? query : [query];
    const normalizedQueries = await Promise.all(
      queries.map((currentQuery) => this.queryTransformer(normalizeQuery(currentQuery), context))
    );

    if (normalizedQueries.find((currentQuery) => !currentQuery)) {
      throw new Error('queryTransformer returned null query. Please check your queryTransformer implementation');
    }
    // delete this  cube.js multi query maybe more flexible
    // if (queryType === QUERY_TYPE.BLENDING_QUERY) {
    //   const queryGranularity = getQueryGranularity(normalizedQueries);
    //
    //   if (queryGranularity.length > 1) {
    //     throw new UserError('Data blending query granularities must match');
    //   }
    //   if (queryGranularity.filter(Boolean).length === 0) {
    //     throw new UserError('Data blending query without granularity is not supported');
    //   }
    // }

    return [queryType, normalizedQueries];
  }
paveltiunov commented 3 years ago

@rongfengliang Hey Rong! I agree that conditions may be too strict. We should check that granularities match in a broad sense: both should be empty or the same. Also blending query is not the same as multi querying. Blending queries should be against the same date interval. Current API doesn't support generic batch multi querying as of right now.

github-actions[bot] commented 3 years ago

If you are interested in working on this issue, please leave a comment below and we will be happy to assign the issue to you. If this is the first time you are contributing a Pull Request to Cube.js, please check our contribution guidelines. You can also post any questions while contributing in the #contributors channel in the Cube.js Slack.

tchell commented 3 years ago

Hey @paveltiunov I looked around a bit and didn't see anything else about supporting multi-querying so I'll ask here. Is there a specific reason multi-querying is not supported? Or has it just not been implemented yet? To me the decision to have an array of queries not be a multi query is strange. I'm actually not sure what the purpose is of calling it 'Data Blending' in the client libraries since by the description in the docs:

Another use case of the Data Blending approach would be when you want to chart some measures (business related) together and see how they correlate.

it seems like you are just describing multi queries. But maybe I'm misunderstanding the terms so if I'm wrong please correct me.

omrihaber commented 7 months ago

Are there any updates on this? Are there any plans for allowing multi-queries?

kodeine commented 1 month ago

@paveltiunov its an old issue i think multi query feature is much needed.

igorlukanin commented 1 month ago

@kodeine @omrihaber @tchell Could you please explain as good as you can what you understand under "multi-query feature"?

omrihaber commented 1 month ago

This is how I see this feature: Input: an array of queries that may or may not overlap in time dimension or may not specify a time dimension at all. Output an array of results of the queries given previously.

Example use case: An array of queries where each query counts the number of rows of data until a certain date i.e. for an array of time points $[t_1,...,t_k]$ s.t. $t_1<...<t_k$ We will define the following queries: $Q_1$ = select count(*) where time_dimension< $t_1$ ; . . . $Q_k$ = select count(*) where time_dimension< $t_k$ ;

AFAIK in the current state of the implementation if some time dimensions in the array of queries overlap cube will try and data blend the results to a single timeline. We want the ability to disable data blending and remove the necessity of specifying granularity for such queries(either by a different type of request or by enabling to specify null time granularity).

igorlukanin commented 4 weeks ago

@omrihaber Thanks for a very elaborate explanation! I understand how you envision this to work with an array of queries.

The question is: can you take each query from this array and just execute it separately?

tchell commented 4 weeks ago

The question is: can you take each query from this array and just execute it separately?

We can right now yes, but then each dataset must be iterated and merged client-side. This is not ideal and for most backends would be do-able server-side. An extra benefit is that all the data coming in on a single query means there is only one loading/error state that needs to be considered which simplifies the client-side even more.

igorlukanin commented 4 weeks ago

Yes:

AFAIK in the current state of the implementation if some time dimensions in the array of queries overlap cube will try and data blend the results to a single timeline.

But:

We can right now yes, but then each dataset must be iterated and merged client-side.

I think I don't fully catch the idea now. Do you want the data from different queries blended / merged or not? If you can explain — or, better, show an example of what is desired and what is not — it would be fantastic.

omrihaber commented 1 week ago

I want it returned through a single request(not blended on a single timeline), in order to not send multiple requests to get the query results.

igorlukanin commented 1 week ago

in order to not send multiple requests to get the query results.

@omrihaber And why do you want to avoid that, exactly?