Snowflake-Labs / schemachange

A Database Change Management tool for Snowflake
Apache License 2.0
482 stars 219 forks source link

Update fetch_r_script_checksum to iterate over multiple results batches within a cursor #120

Open gcnuss opened 2 years ago

gcnuss commented 2 years ago

In our experience using schemachange, we have run into an issue where fetch_r_script_checksum fails due to the quantity of R-scripts in our pipeline exceeding the data size that will fit within one Result Batch as returned by snowflake-connector-python. It will successfully iterate over all the rows in the first "chunk" but fail to proceed into a second chunk if it exists. The suggested code update resolves this by first returning all results batches and iterating through the rows within each batch. It does require pyarrow to be installed, which I've added to setup.cfg. The specified range of versions matches those required by snowflake-connector-python.

Note that I did discover the latest version of snowflake-connector-python (2.7.8) no longer fails with the existing fetch_r_script_checksum code as-written, so upgrading that config requirement may also work. That said I tested the suggested code update below with snowflake-connector-python 2.6.2, 2.7.4, and 2.7.8 and it works on all of them, so it would be more flexible and could be implemented without upgrading to 2.7.8 yet.

jonrobinsdev commented 1 year ago

+1. Am having the same issue currently with our snowflake migrations pipelines that use schemachange. Would love if this fix could be merged, or at the very least snowflake-connector-python could be updated to 2.7.8

sfc-gh-jhansen commented 1 year ago

I'm so sorry for the delay here everyone (@gcnuss, @jonrobinsdev), I've been super busy lately and unable to give schemachange the attention it deserves. I just published a new version of schemachange (3.4.2) which updated the dependency on snowflake-connector-python to the newest version 2.8.

Please let me know if that solves the problem, and if you still think we should move forward with the batching code (and dependency on pyarrow) here.

sfc-gh-tmathew commented 8 months ago

@gcnuss @jonrobinsdev We have been reviewing open requests and trying to cathup on open items. Would appreciate your help to address this open item.

Does the issue still exist? Does the latest version render this PR unnecessary?

Kindly confirm