cube-js / cube

📊 Cube — The Semantic Layer for Building Data Applications
https://cube.dev
Other
17.91k stars 1.77k forks source link

Download/Export API #251

Closed ifokeev closed 1 month ago

ifokeev commented 4 years ago

Describe the bug api-gateway doesn't allow to pass limit over 50000 https://github.com/cube-js/cube.js/blob/1a8260522a234ebaabb9360f58e9b62095fd87f7/packages/cubejs-api-gateway/index.js#L133 and that's strange. I have no ability to use offset, but I want to increase the limit.

Expected behavior There should be no limit in api-gateway but may be in playground


Updated by @igorlukanin. See partial solution: https://github.com/cube-js/cube/issues/251#issuecomment-2046918025

paveltiunov commented 4 years ago

@ifokeev Thanks for posting this! There's hard limit for security purpose. Could you please elaborate a little bit on your use case?

ifokeev commented 4 years ago

There's hard limit for security purpose

This hard limit should be inside playground and not in API. User knows better what SQL he needs while using standalone API. If it's so important so there may be something like skipRestrictions flag

Could you please elaborate a little bit on your use case?

I just need to query more than 50000 of rows. I use it to predict dimensions, not measures. Now CubeJS limiting my opportunity to do this.

paveltiunov commented 4 years ago

@ifokeev I see. Most of cube.js API are exposed to some sort of untrusted environments directly. Without this check it's very easy to exploit it for DoS attack.

As it was discussed previously I'd suggest to provide separate API for export and downloading results. What do you think?

ifokeev commented 4 years ago

@paveltiunov

As it was discussed previously I'd suggest to provide separate API for export and downloading results. What do you think?

May be a good solution. I faced at cubejs-server-core is not extendable by design and I can't change schema validation of cubejs-api-gateway too. Other solution could be just to allow user-pass schema for queries or disable it at all. For example, I use GraphQL API and don't need validation from api-gateway.

rickj33 commented 4 years ago

Most of cube.js API are exposed to some sort of untrusted environments directly. Without this check it's very easy to exploit it for DoS attack.

I dont think it is cube.js responsibility to protect against DoS attacks. Having a default value limiting the results is good, but I should be able override that value to something greater if my requirements are such. If I do then it is my responsibility to protect against DoS attacks. In my case, I am using cube.js in an internal trusted corporate environment where I do not have to worry about DoS attacks.

paveltiunov commented 4 years ago

@rickj33 Hey Rick! I think we can consider adding option to override default limit on server. Could you please elaborate on your use case though? Is it browser that requires to load more results or some other service that just hits cube.js API? How many rows in total do you need to download?

ifokeev commented 4 years ago

@paveltiunov let's go with overriding default validation schema, not only the limit

rickj33 commented 4 years ago

That sounds good.

paveltiunov commented 4 years ago

@ifokeev Could you please elaborate?

ifokeev commented 4 years ago

@paveltiunov I mean we need user defined schema here: https://github.com/cube-js/cube.js/blob/master/packages/cubejs-api-gateway/index.js#L93

Allow user pass his schema or disable it at all

nikevp commented 4 years ago

+1 on api export

JoshMentzer commented 3 years ago

@paveltiunov Here's an use case I have - we've setup our own query 'playground' front end for people to build into an angular dashboard app. When they set a filter, we submit a query for that dimension and same other filters to populate an autocomplete dropdown. Sometimes they select a field that has more than 50k potential items in it; and as such, they don't return all values and end up thinking that the system is 'missing' things. Granted, in some cases we could work around this using paging/offset etc, but we can't count on being able to do that 100% of the time. I personally would love to be able to override the query limit on queries.

paveltiunov commented 3 years ago

@JoshMentzer I see. Do you show all the 50k potential items in dropdown? Or do you provide any kind of search by name functionality here?

JoshMentzer commented 3 years ago

@paveltiunov No, don't show all 50k - just ones that match what they type in the autocomplete/search ahead whatever you want to call that type of control. We are using virtual scrolling there, so I'm sure we could tie in the query with the filter on a 'contains' or some such instead of the direct search of the list that comes back from the query; is really just choice we made for UX reasons; if it has to go back and hit cube for a query, 'feels' much less performant. We would of course do that if we were concerned over user system resources, etc, but in our case we have a controlled audience and can make the trade off for the performant feel vs use of resources.

imjared commented 3 years ago

We're doing something similar to @JoshMentzer in that we're allowing authorized users to run queries to build lists of people/users. Said lists only need to show a preview on the client so we could use something like the ?limit or ?offset parameters to truncate results. However, we'd need to show the total number of results from our BQ database based on a secondary provided query.

somewhat related: https://cube-js.slack.com/archives/CC0403RRR/p1611741898136000?thread_ts=1611718901.127000&cid=CC0403RRR

klose4711 commented 2 years ago

Is it possible to override the hard limit of 50.000?

danieka commented 2 years ago

@paveltiunov Being able to override the 50000 limit on queries would be very useful for my use case. It's a sensible default, but it would be great to be able to configure it. Are you working on this, if not, would you consider merging a PR from me, if I find time to do it?

mstruser commented 1 year ago

@paveltiunov We are also having issues because of this limitation, any plan to get it added in future release?

amans236 commented 1 year ago

@paveltiunov we are facing the same issues when using CUBE to populate our BOARD data model for reporting purposes. Has a solution been found? or at least a temporary work around?

paveltiunov commented 11 months ago

CUBEJS_DB_QUERY_LIMIT can be used to override the default limit. Setting it to big values may cause out-of-memory crashes.

vitorfraga commented 8 months ago

Hello @paveltiunov, does changing this parameter change the default value for SQL API?

igorlukanin commented 6 months ago

Hey all, I've added a section in the docs (https://cube.dev/docs/product/data-modeling/queries#row-limit) that explains the row limit as well as the CUBEJS_DB_QUERY_LIMIT environment variable that you can use to bump it. Just be advised that bumping the row limit substantially may cause out-of-memory (OOM) crashes.

I'm keeping this issue open to further track the possible introduction of a "Download/Export API."

bzenker-amplify commented 4 months ago

I've tried setting the env var but the cube-generated SQL query still has LIMIT 10000, regardless.

This is when calling the Cube API with Graphql and with Cube Docker Image v0.35.47 I've tried both CUBEJS_DB_QUERY_LIMIT=50000 and CUBEJS_DB_QUERY_LIMIT="50000"

Edit: I misunderstood the use of the ENV Var. You can specify a limit in your query up to 50000, without changing the env var, got it. e.g. query CubeQuery { cube(limit: 50000, where(...

igorlukanin commented 1 month ago

Streaming mode in the SQL API removes this limitation.