crate-workbench / cratedb-toolkit

CrateDB Toolkit.
https://cratedb-toolkit.readthedocs.io/
GNU Affero General Public License v3.0
5 stars 3 forks source link

[I/O] Enable using CrateDB-specific DDL statements on `ctk load table` interface #149

Open amotl opened 1 month ago

amotl commented 1 month ago

About

@wierdvanderhaar suggested to improve the ctk load table interface such that the target table can be created with CrateDB-specific features like partitioned table, and friends. Thanks.

amotl commented 3 weeks ago

Currently, it is not even possible to create the table manually using specific parameters, because the process is dropping the table and recreating it with standard options, see GH-148. That's probably the worst approach ;].

amotl commented 3 weeks ago

It may make sense to have a dedicated subcommand aside to ctk load table, in order to conceive a little subsystem specialized for forwarding DDL to the database. On the other hand, maybe let's just use plain SQL DDL and slap it in front of a ctk load table call, using ctk shell or just crash.

amotl commented 2 weeks ago

On the other hand, maybe let's just use plain SQL DDL and slap it in front of a ctk load table call, using ctk shell or just crash.

While that will give more powers to the end user, in order to better use CrateDB's special features, it would totally deter the conveniency of automatic schema propagation. Maybe we should go the extra mile of being able to supply proprietary SQL DDL parameters to the application?

However, this approach would still need some kind of special patches to pandas' to_sql(), because otherwise, well, it would still generate a standard SQL DDL statement. Another approach would be to capture the standard SQL DDL statement, and relay it to the user to be extended manually. Then, she would feed it back to the ETL job by running it just once.

amotl commented 2 weeks ago

[...] would still need some kind of special patches to pandas' to_sql(), because otherwise, well, it would still generate a standard SQL DDL statement.

The question is how to bring that together and make it work on behalf of SQLAlchemy Core (not ORM), because in this mode, there just isn't any ORM model where __table_args__ can be slapped onto.

amotl commented 2 weeks ago

At this spot, in pandas.io.sql.SQLTable._create_table_setup, the call to Table(self.name, meta, *columns, schema=schema) would need to convey additional table arguments to SQLAlchemy's sqlalchemy.sql.schema.Table implementation through **kw, as dialect-specific keyword args.

:param \**kw: Additional keyword arguments not mentioned above are
    dialect specific, and passed in the form ``<dialectname>_<argname>``.
    See the documentation regarding an individual dialect at
    :ref:`dialect_toplevel` for detail on documented arguments.

For example, defining a PARTITIONED BY ("time") clause would work like that:

Table(self.name, meta, *columns, schema=schema, crate_partitioned_by="time")
amotl commented 2 weeks ago

There are patches now, which aim to improve the situation.

SQLAlchemy Dialect