brimdata / zui

Zui is a powerful desktop application for exploring and working with data. The official front-end to the Zed lake.
https://www.brimdata.io/download/
Other
1.76k stars 130 forks source link

Export without requiring additional query execution #3121

Open SoftTools59654 opened 2 weeks ago

SoftTools59654 commented 2 weeks ago

Query again when exporting

When we execute a query and the output result is displayed. Again, when we want to export the searched query again, it starts querying the entire database again But the best option is to send the data found in the first query directly to export

This problem exists only in large data and it takes a little more time

The entire data is searched twice in the first query and again in the output

philrz commented 2 weeks ago

I've transferred this issue to the Zui repository since the mention of "exporting" makes it sounds like an app issue.

I first want to clarify some behaviors to fully capture the problem being described. First, it's not necessarily true that a query is executed in full each time it's submitted via Zui. A desktop app like Zui can only hold a certain amount of data in memory at one time for a user to browse, plus a user is only likely to browse so much data at a time before moving on to another query or pool. Therefore the app currently transparently appends a head 500 to ask the Zed lake service to limit the query response to the first 500 values returned. If the user scrolls down in the results pane to the bottom, the re-executed query is once again adjusted, this time to get only the next 500 values, etc. This all has the benefit of only taxing the backend to the degree necessary to present the subset of the query result the user will be shown. If the user ever decides they want to Export the full query response, only then would the backend let the query execute all the way to completion and stream back the entire result.

That said, I'm guessing that your question was motivated by a different kind of query, such as a "needle in a haystack" search across a large pool (or many large pools) to find a small number of values, or similarly to run an aggregation against a large amount of data to generate a summary result with a small number of values (e.g., probably less than the view limit of 500 values in both cases). It would be helpful to confirm if this is indeed what you have in mind, @SoftTools59654.

Assuming I have that correct, then we certainly recognize that this could be frustrating. We had a group discussion about ways to potentially go about improving here.

One thought was to perhaps have the backend temporarily cache the query response (e.g., in some kind of "scratch pool") such that if the user requests an Export soon after the query finished executing in full they'd have the option to get the query response from the cache instead. While technically feasible, the backend Dev team seemed to think their time would better be spent on generally improving query performance overall, and indeed that where the bulk of their time is currently focused.

Therefore sticking to the app side instead, another approach was proposed that showed a bit more promise. We already have some existing enhancement ideas related to "selecting" the data in the query responses shown in the app (#2635, #1176), such as if the user were looking to cut and paste the results elsewhere (e.g., into a spreadsheet app). When we implement those, the app's JavaScript representation of the query response would effectively be re-processed and output in the user's desired destination format (e.g., CSV, TSV, text lines, etc.), and it seems it would therefore be possible to use the same approach to Export "all" of the displayed query result in any format (e.g., JSON, ZNG, etc.) However, per the point above about how the app only holds a maximum of 500 query response values at once, this approach would only be viable for query results that size or smaller.

In any case, the Dev team is busy with many other priorities right now so I don't expect an enhancement in this area very soon, but I wanted to capture our preliminary design thoughts for when we do. In the meantime, @SoftTools59654 we'd be happy to hear your responses on some of what's sketched out above.