Open c-thiel opened 2 months ago
CC @Fokko , @nastra , @Xuanwo
Allowing such a case would also require a change on the table scan api, e.g. we need a way to allow user to tell use what foo.bar
actually means.
Also cc @rdblue
FWIW. Can we introduce a structure similar to TableIdent
to represent field paths, named 'FieldPath'?
TableScan::column_names
from string
to FieldPath
(and provide some utilities to simplify the construction of FieldPath
).name_to_id
to FieldPath
.FieldPath
would also need to be modified.This is an implementation detail that is not part of the spec. But I don't think it is worth bothering to enable both ["foo.bar"]
and ["foo", "bar"]
identifiers in the same schema. That's confusing for users, who will almost certainly be confused by tables with such odd structures.
The reference implementation has been this way for about 7 years and no one has every complained or, to my knowledge, hit a problem with this in practice. I highly recommend focusing time and effort on other improvements.
Currently due to the way name-to-id is build, we cannot have points in columnames if it collides with a struct.
The following schema fails to build:
By prohibiting this we follow the Java implementation. There is nothing in the iceberg spec that prohibits these names, so I think we should allow them. For column names as a user I expect the same behaviors as for namespaces or databases with points - it needs to be escaped. As escaping depends on the query engine, and iceberg-rust as well as iceberg java has no SQL-parser, those libraries should not take away the option from the engine or make that decision in their stead.
I propose to change the representation of a colname to
Vec<String>
instead of just "String with points". It would also make accessor compatible between schemas - even if we decide to stick keep this artificial restriction.