Open NI1993 opened 3 years ago
GREAT IDEA!
Hi @NI1993 - interesting! Thanks for bringing this up. So just to understand you correctly: you would like to run the SQL on big query and end up with a dask dataframe, correct?
Dask-SQL helps you with perfoming SQL statements on already present dask dataframes, so getting data from Big query would be somehow a step before that (and for Dask-SQL or would not be different from reading from e.g. files).
Did you have a look into https://github.com/dask/dask/issues/3121? Unfortunately, I think this issue has not been continued, but I would be super happy to build that together with your help. I think it might make sense to move this to a different new package (dask-bigquery) or something.
The reason why I would propose to not include it to Dask-SQL, is so that other users without dask-sql can also create dataframes from Big query (and as we pass on the SQL, we do not need to "understand" the SQL).
What do you think? I would be very happy to get your input!
I'm working in a project which involves with querying a Google Bigquery server.
Depending on the sql and the data it's being ran on the output result (table) may be too large to fit in memory.
It would be very nice feature to use dask-sql to query over a Bigquery server, and get the resulted table as a dask data frame.
I've managed to think of a workaround to fit a very large sql result in a dask data frame:
prefix the sql with the export statement
Read the output file (in cloud storage) using dask.
Let me know what do you think :)