HGInsights / avalanche

Avalanche is an Elixir Snowflake Connector built on top of the Snowflake SQL API v2.
Apache License 2.0
17 stars 4 forks source link

Error decoding data when partition is provided #49

Closed vuphamcs closed 4 months ago

vuphamcs commented 5 months ago

Greetings and thank you for providing this library. I have just started using Avalanche and have encountered an error when attempting to retrieve query results using the partition option. Please let me know if this is a bug or if I am using the library incorrectly.

Describe the bug After submitting an async sql statement and retrieving the results with the partition option, Avalanche.Steps.DecodeData.decode_data/1 errors because row_types is nil.

To Reproduce Steps to reproduce the behavior:

# run async query
run_sql = "SELECT * FROM TABLE WHERE TS BETWEEN ? AND ?"
run_params = ["2023-12-01T00:00:00.000000Z", "2024-12-01T00:00:00.000000Z"]
run_opts = [async: true]

{:ok, %{statement_handle: statement_handle}} = Avalanche.run(run_sql, run_params, run_opts)

# retrieve result
status_opts = [partition: 1]
{:ok, result} = Avalanche.status(statement_handle, status_opts)

Error message

iex(6)> {:ok, result} = Avalanche.status(statement_handle, [partition: 1])
** (Protocol.UndefinedError) protocol Enumerable not implemented for nil of type Atom. This protocol is implemented for the following type(s): DBConnection.PrepareStream, DBConnection.Stream, Date.Range, Ecto.Adapters.SQL.Stream, File.Stream, Function, GenEvent.Stream, HashDict, HashSet, IO.Stream, Jason.OrderedObject, List, Map, MapSet, Phoenix.LiveView.LiveStream, Postgrex.Stream, Range, Stream, Timex.Interval
    (elixir 1.16.0) lib/enum.ex:1: Enumerable.impl_for!/1
    (elixir 1.16.0) lib/enum.ex:166: Enumerable.reduce/3
    (stdlib 5.2) lists.erl:1686: :lists.foreach_1/2
    (elixir 1.16.0) lib/stream.ex:1372: Stream.do_zip_enum/4
    (elixir 1.16.0) lib/enum.ex:4030: Enum.zip_reduce/4
    (elixir 1.16.0) lib/enum.ex:1700: Enum."-map/2-lists^map/1-1-"/2
    (avalanche 0.12.2) lib/avalanche/steps/decode_data.ex:43: Avalanche.Steps.DecodeData.decode_data/1
    (req 0.3.12) lib/req/request.ex:755: anonymous fn/2 in Req.Request.run_response/2
    (elixir 1.16.0) lib/enum.ex:4842: Enumerable.List.reduce/3
    (elixir 1.16.0) lib/enum.ex:2582: Enum.reduce_while/3
    (req 0.3.12) lib/req/request.ex:683: Req.Request.run/1
    (avalanche 0.12.2) lib/avalanche/requests/status_request.ex:78: Avalanche.StatusRequest.run/3
    (avalanche 0.12.2) lib/avalanche.ex:205: Avalanche.status/3

Expected behavior The result should have properly decoded rows from partition 1.

Additional context I am using the partition option because the query result is too large to store in memory. When I run the status query without specifying [partition: 1], all results are returned and decoded properly. When [partition: 1] is passed the response body seems to be missing resultSetMetaData where rowType information is usually found.

vuphamcs commented 5 months ago

I just discovered that resultSetMetaData is only provided by Snowflake's API when retrieving data for the first partition (partition 0). That would explain why Avalanche.Steps.GetPartitions.build_status_request/4 stores avalanche_row_types, so it can reuse rowTypes to decode later partitions. I think this means there is no bug here but is there any advice on how to best go about retrieving very large query results in smaller increments?