ashish10alex / vscode-dataform-tools

Dataform tools - a vscode extension
https://marketplace.visualstudio.com/items?itemName=ashishalex.dataform-lsp-vscode
MIT License
20 stars 4 forks source link

Feature request: Have a count number of rows in query tab #34

Closed TheSmartMonkey closed 1 month ago

TheSmartMonkey commented 1 month ago

Summary

Be able to see the number of rows of the table in the query tab even with a query limit

Motivation

Gain some time when testing my queries

Detailed Description

When creating your transformation queries, it is useful to have the number of rows One of my must common use case is when I do data deduplication (the I know if I removed the data correctly)

Expected Behavior

See the number of rows of the table in the query tab

2 ideas so far :

  1. A second query is done in parallel that count the number of rows
  2. It could be by adding a button count after query
ashish10alex commented 1 month ago

Hi @TheSmartMonkey , thanks for raising the feature request. Just to clarify what you mean by query tab is it the compilation web view that comes up on the right ?. If yes, would it be ok if I just fetch the metadata of the table after it is materialised and put the query count somewhere in the web view e.g. here CleanShot 2024-10-16 at 20 18 11@2x

TheSmartMonkey commented 1 month ago

Sorry it was not very clear

What I called the "query tab" is the terminal Dataform tools tab

image

In the dataform webview there is always the count "1 – 50 sur 45343" in the picture down below I would like the same thing if possible

image

ashish10alex commented 1 month ago

Hi, thanks for clarifying. Maybe I can give you some background into why the functionality is the way it is today. I had initially build the querying without any limit. What I observed was if the tables data start exceeding over 50k rows the time taken to receive the data back from the api is too high as compared to what natively BigQuery provides. To experience the slowness yourself using python you can try to use pandas.read_gbq to query a large enough table. Note that the slow part is not the querying in BigQuery but the data transfer rates. You will notice in the documentation there is an option to use storage api use_bqstorage_apibool to get there results much faster but at a cost. Although we can perhaps emulate this to get all results faster with a javascript equivalent but this cannot be the default for the extension and for this reason even I use BigQuery ( not Dataform UI ) for any series sql querying.


I had also looked into pagination, but I was not satisfied on how it could benefit me. Hence, currently I have build it as a preview table feature, to have a quick glance at your data. I personally use it to look at some metadata table which I do simple transformations to and are often small


I am happy to continue the discussion. Maybe you / someone else can highlight something that I might be missing or is possible