fixer-m / snowflake-db-net-client

Snowflake .NET Client
Apache License 2.0
51 stars 14 forks source link

Downloading data from chunks is not implemented yet. #13

Closed andras-nemes-snowsoftware closed 2 years ago

andras-nemes-snowsoftware commented 2 years ago

Hello,

I get the exception in the title when I run QueryAsync with a "large enough" result. E.g. if I have 1000 elements in the table and execute a SELECT * FROM on the table then snowflakeClient.QueryAsync throws this exception. There seems to be no fix limit on the number of rows in the data set but it's rather the total size of the return data.

Is this a known issue?

Thanks, Andras

fixer-m commented 2 years ago

Hello!

Yes, Snowflake returns "big" response data in "chunks" (pieces), so client should download and combine them locally to get full result. Unfortunately this feature is not implemented yet.

andras-nemes-snowsoftware commented 2 years ago

Hi Ilya,

Thanks for your response.

How would you read large data sets with the library? E.g. if I execute "snowflakeClient.QueryRawResponseAsync("select top 500 * from software;")" then I see that there are 2 chunks and 0 rows in the result set. I see that chunk 1 has 361 rows and the chunk 2 has the rest, i.e. 139 rows. I can't find any way to read those chunks though, how can it be done? Thanks!

fixer-m commented 2 years ago

Hi!

So each chunk has URL property (response.Data.Chunks[0].Url) - that's download link. And there are headers in ChunkHeaders (for example it can be x-amz-server-side-encryption-customer-key and amz-server-side-encryption-customer-key-md5), you should pass these headers with according values in download request. Look at example at https://github.com/snowflakedb/snowflake-connector-net/blob/master/Snowflake.Data/Core/RestRequest.cs#L63, S3DownloadRequest class.

rteising commented 2 years ago

Any chance that downloading chunks will be implemented soon? The Snowflake.Data API is choking on large result sets and my quick code example of downloading 25 chunks worked super fast.

I like what I see from the rest of this API, but we definitely need to be able to download result sets with more than 1,000 records.

fixer-m commented 2 years ago

@andras-nemes-snowsoftware, @rteising I've just released new version 0.3.8 that can download response data splitted into chunks, i.e. it can process big responses.

It does this automatically, but if you want - you can try to use ChunksDownloader manually:

var queryResult = await _snowflakeClient.QueryRawResponseAsync("select top 10000 * from SNOWFLAKE_SAMPLE_DATA.TPCH_SF1000.SUPPLIER;");
var chunksDownloadInfo = new ChunksDownloadInfo() { ChunkHeaders = queryResult.ChunkHeaders, Chunks = queryResult.Chunks, Qrmk = queryResult.Qrmk };

var parsedChunks = await ChunksDownloader.DownloadAndParseChunksAsync(chunksDownloadInfo);
var suppliers = SnowflakeDataMapper.MapTo<Supplier>(queryResult.Columns, parsedChunks);

Unfortunately I can't test this feature extensively, because I'm using free tier account in SF, so (as always) I would appreciate any feedback about it.

andras-nemes-snowsoftware commented 2 years ago

Thanks for your work Ilya, it's much appreciated!

harindaka commented 2 years ago

@fixer-m is there a way to download and process one chunk at a time? The above example looks like it downloads all chunks and then assigns it to the suppliers variable. Is there a way to get a IList<Supplier> chunk process it and then retrieve the second chunk and so on?