Open raphaelauv opened 1 week ago
thanks for reporting this issue!
The _check_schema_compatible
is currently more strict than it should be. In #829, the _check_schema_compatible
check is relaxed.
Would #829 fix your issue above?
hey @kevinjqliu I tried your PR it do not fix the insert of UUID
I see, I also verified that _check_schema_compatible
errors.
Heres an example to repro:
def test_schema_uuid() -> None:
import polars as pl
iceberg_schema = Schema(
NestedField(1, "id", UUIDType(), required=True),
)
id_to_write = uuid.uuid4()
df = pl.DataFrame({}).with_columns([pl.lit(id_to_write.bytes).alias("id")])
df = df.to_arrow()
df = df.cast(target_schema=iceberg_schema.as_arrow())
_check_schema_compatible(iceberg_schema, df.schema)
Looks like @Fokko opened an issue regarding UUID for arrow https://github.com/apache/arrow/issues/15058
@Fokko can you chime in here on writing UUID data type?
Thanks for pinging me here. So there is some progress on the Arrow side. There has been a vote to adopt the UUID type, and it has been added to the format.
Thanks for the example code @kevinjqliu:
And I would say that they are equivalent. So if we know that the field in the Iceberg table is a UUID, just writing a Fixed[16]
is okay and should pass the compatibility check.
Apache Iceberg version
main (development)
Please describe the bug 🐞
I can't write a UUID in an iceberg table
but if I comment the call to
_check_schema_compatible
then it write to the tablehttps://github.com/apache/iceberg-python/blob/a6cd0cf325b87b360077bad1d79262611ea64424/pyiceberg/table/__init__.py#L485
and I can read the data with trino