crate / cratedb-sqlparse

Parsing utilities to validate and split SQL statements for CrateDB.
Apache License 2.0
4 stars 1 forks source link

Only latest table name is saved #50

Closed surister closed 6 days ago

surister commented 1 week ago

Sql queries can have several schemas/table names.

>>> stmt = sqlparse("""SELECT a FROM sys.t JOIN doc.b ON t.a = b.a""")[0]
>>> print(stmt.metadata)
Metadata(schema='doc', table_name= 'b', parameterized_properties={}, with_properties={})

sys.t table information is lost, it's easy to fix, the question is about the resulting API

Metadata(schema=['sys', 'doc'], table_name=['t', 'b'], parameterized_properties={}, with_properties={}) would this be good enough, where schemas and table_name are ordered?

If no schema it'd be: Metadata(schema=[None, 'doc'], table_name=['t', 'b'], parameterized_properties={}, with_properties={})

Now that you are integrating into this, wdyt? @amotl

amotl commented 1 week ago

Thanks for your support. I need this table name extraction feature over there right now, and probably later for other spots as well.

So, when looking at this...

schema=['sys', 'doc'], table_name=['t', 'b']

... would I need to coalesce them together again to get full-qualified table names like sys.t and doc.b? In this case, the client would need to know about CrateDB's quoting rules. So, wouldn't a representation like "sys"."t" and "doc"."b" possibly be optimal?

/cc @matriv, @hlcianfagna, @hammerhead, @seut

surister commented 1 week ago

Idea:

Metadata(tables=[Table(schema='doc', name='t'), ...], parameterized_properties={}, with_properties={}) where the Table dataclass has some kind of .quoted so:

>>> t = Table(schema='doc', name='t')
>>> t.quoted
"doc"."t"

Would also enable new stuff in the future like aliases.

>>> stmt = sqlparse('SELECT a FROM sys.t alias_a JOIN doc.b alias_b ON t.a = b.a')[0]
>>> t = stmt.tables[0]
>>> t
Table(schema='sys', name='t')
>>> t.quoted
'"sys"."t"'
>>> t.alias
{"alias_b": 'sys.t'}
amotl commented 1 week ago

OO FTW. Excellent, je vous remercie.

Suggestion: Maybe use t.fqn instead of t.quoted, to better convey it is the "full" representation, including schema- + table-name? That it's also a properly quoted value, might be an implicit feature, not necessarily reflected by its name.

amotl commented 1 week ago

JFYI: Maybe CrateDB Toolkit's quote_relation_name() function is helpful in any way? Please let me know, I planned to refactor it to sqlalchemy-cratedb earlier or later. You may not necessarily need it in your context, but if you do, I would be convinced enough to make it happen earlier than later.