crate / crate-clients-tools

Clients, tools, and integrations for CrateDB.
https://crate.io/docs/clients/
Apache License 2.0
2 stars 1 forks source link

Make CrateDB work with `dataset` #48

Open amotl opened 1 year ago

amotl commented 1 year ago

Hi there,

back in a while, I've tried to use the sweet dataset package with CrateDB.

Being built on top of SQLAlchemy, dataset works with all major databases, such as SQLite, PostgreSQL and MySQL.

For exercising it, and to provide a common ground for others to experiment with, I've created the cratedb-dataset-demo.py gist.

Within this meta issue, all related issues will be tracked which are needed to make the demo program work completely.

With kind regards, Andreas.

References

amotl commented 1 year ago

With recent improvements, most notably https://github.com/crate/crate/pull/11165, which added the gen_random_text_uuid() scalar function, primary key values can be automatically generated when inserting new records. This was essential to make INSERT operations work like table.insert(dict(name="John Doe", age=37)).

The corresponding SQL DDL statement looks like:

CREATE TABLE IF NOT EXISTS "doc"."testdrive" (
    "id" TEXT DEFAULT gen_random_text_uuid() NOT NULL,
    "name" TEXT,
    "age" INTEGER,
    "gender" TEXT,
    PRIMARY KEY ("id")
);

Currently, the schema has to be provided manually, maybe because dataset itself only handles automatic provisioning of autoincrement-like columns, i.e. one of Types.integer, Types.bigint ^1. It would be a nice-to-have to make the automatic schema creation work, like it works on other databases as well. Maybe it will be enough to add Types.{string,text} at ^2?

One of the main features of dataset is to automatically create tables and columns as data is inserted. This behaviour can optionally be disabled via the ensure_schema argument. It can also be overridden in a lot of the data manipulation methods using the ensure flag.

-- https://dataset.readthedocs.io/en/latest/api.html#connecting