Open radeusgd opened 2 months ago
This should be a general solution for Postgres, SQL Server, and Snowflake.
This should be a general solution for Postgres, SQL Server, and Snowflake.
The batched insert is more or less a general solution.
The problem highlighted in this ticket is that, for Snowflake, the batched insert is slower than it should be. The PUT
and COPY INTO
mechanisms are Snowflake-specific mechanisms for efficient upload of bigger amounts of data that may help in this particular case.
I'm not quite sure we can find a general solution that will be fast for all backends. Is the 0.7-2.2s for 50k rows in Postgres not satisfactory? If not, we probably would need to test some baseline approach using COPY
- but I'm not sure if it works for remote connection (we'd have to check).
We may want to start with tweaking the batch size for each backend, to possibly try getting better performance.
I've tried using
select_into_database_table
on a single-column, 50k row table.In Postgres this takes roughly 0.7-2.2s, in SQLite also about 0.8s, but in Snowflake it took more than 5 minutes!
We are currently using batched insert queries with each batch consisting of 100 rows.
Perhaps we should tweak the batch size for Snowflake. OR even better, we could use Snowflake's data loading features:
PUT
andCOPY INTO
that allows uploading whole CSV files to the database. We could save the table to temporary CSV file and upload it using this mechanism - very likely it may be much more efficient.