enso-org / enso

Hybrid visual and textual functional programming.
https://enso.org
Apache License 2.0
7.36k stars 324 forks source link

Uploading data into Snowflake is slow #10819

Open radeusgd opened 2 months ago

radeusgd commented 2 months ago

I've tried using select_into_database_table on a single-column, 50k row table.

In Postgres this takes roughly 0.7-2.2s, in SQLite also about 0.8s, but in Snowflake it took more than 5 minutes!

We are currently using batched insert queries with each batch consisting of 100 rows.

Perhaps we should tweak the batch size for Snowflake. OR even better, we could use Snowflake's data loading features: PUT and COPY INTO that allows uploading whole CSV files to the database. We could save the table to temporary CSV file and upload it using this mechanism - very likely it may be much more efficient.

GregoryTravis commented 5 days ago

This should be a general solution for Postgres, SQL Server, and Snowflake.

radeusgd commented 5 days ago

This should be a general solution for Postgres, SQL Server, and Snowflake.

The batched insert is more or less a general solution.

The problem highlighted in this ticket is that, for Snowflake, the batched insert is slower than it should be. The PUT and COPY INTO mechanisms are Snowflake-specific mechanisms for efficient upload of bigger amounts of data that may help in this particular case.

I'm not quite sure we can find a general solution that will be fast for all backends. Is the 0.7-2.2s for 50k rows in Postgres not satisfactory? If not, we probably would need to test some baseline approach using COPY - but I'm not sure if it works for remote connection (we'd have to check).

radeusgd commented 5 days ago

We may want to start with tweaking the batch size for each backend, to possibly try getting better performance.