crate / crate-python

Python DB API client library for CrateDB, using HTTP.
https://cratedb.com/docs/python/
Apache License 2.0
78 stars 31 forks source link

Add fast-path INSERT method `insert_bulk` for SQLAlchemy/pandas/Dask #553

Closed amotl closed 1 year ago

amotl commented 1 year ago

Hi again,

in the same spirit as https://github.com/crate/crate-pdo/pull/143, this patch unlocks CrateDB bulk operations for efficient batch inserts using pandas and Dask. It will accompany a followup to the Data Processing and Analytics with Dask and CrateDB: A Step-by-Step Tutorial.

The corresponding documentation section can be inspected at Preview: SQLAlchemy: DataFrame operations.

With kind regards, Andreas.

/cc @marijaselakovic, @hammerhead, @hlcianfagna, @proddata, @WalBeh

amotl commented 1 year ago

Regarding the failing software tests, they should not stop you from reviewing this patch.

I will exclude the relevant test matrix slots from being executed, and add a corresponding .. note:: section to the documentation, about which versions of Python and SQLAlchemy are supported.

amotl commented 1 year ago

Hi.

Thanks for all the excellent review comments.

[This patch] will accompany a followup to the Data Processing and Analytics with Dask and CrateDB: A Step-by-Step Tutorial by @marijaselakovic.

Initially, I rushed a bit to bring this in in time for this tutorial, but I missed the deadline. Now, that we have a little more headroom, I think this patch should only be the foundation, and those aspects should be addressed by corresponding followup patches:

I hope you agree with that strategy.

With kind regards, Andreas.

P.S.: Added as backlog items at https://github.com/crate/sqlalchemy-cratedb/issues/74.