apache / iceberg

Apache Iceberg
https://iceberg.apache.org/
Apache License 2.0
6.14k stars 2.14k forks source link

Missing Types.UUIDType in SUPPORTED_PRIMITIVES #1302

Closed openinx closed 1 day ago

openinx commented 4 years ago

I found that we did not put uuid type in org.apache.iceberg.data.DataTest, and after added it in the SUPPORTED_PRIMITIVES (using the following patch), it would break many unit tests.

diff --git a/data/src/test/java/org/apache/iceberg/data/DataTest.java b/data/src/test/java/org/apache/iceberg/data/DataTest.java
index 5566a753..03f848a9 100644
--- a/data/src/test/java/org/apache/iceberg/data/DataTest.java
+++ b/data/src/test/java/org/apache/iceberg/data/DataTest.java
@@ -56,7 +56,8 @@ public abstract class DataTest {
       required(114, "dec_9_0", Types.DecimalType.of(9, 0)),
       required(115, "dec_11_2", Types.DecimalType.of(11, 2)),
       required(116, "dec_38_10", Types.DecimalType.of(38, 10)), // maximum precision
-      required(117, "time", Types.TimeType.get())
+      required(117, "time", Types.TimeType.get()),
+      required(118, "uuid", Types.UUIDType.get())
   );

   @Rule

The broken unit tests:

image

We should add the uuid type in the tests and fix those broken cases.

chenjunjiedada commented 4 years ago

I met this as well. This is due to we don't visit with the accompanying Iceberg schema so GenericParquetWriter cannot convert the UUID value before writing to parquet binary.

openinx commented 4 years ago

we don't visit with the accompanying Iceberg schema

I think we may need to pass both parquet schema and iceberg schema to the visitor so that we could write the UUID value to binary correctly.

JingsongLi commented 4 years ago

Although Spark does not inherit this test, but looks like Spark Avro writer also has bug too.

openinx commented 4 years ago

Yes, spark writer also have this bug. I saw we commented the uuid type in spark test case.

shardulm94 commented 4 years ago

I recently read @rdblue's comment on Iceberg Slack suggesting future removal of UUID type from the spec. Would be great to hear Ryan's opinion if this is worth fixing or if we should rather remove the type.

David Stryker Jul 14th at 12:36 PM Hi @rdblue The Iceberg Presto connector doesn't currently support the UUID type. David Phillips and I were wondering if you thought it was worthwhile to provide that support. (edited) Ryan Blue No, we don't support it in other models either. We should remove it from the spec.

https://the-asf.slack.com/archives/CF01LKV9S/p1594755385061000

rdblue commented 4 years ago

This wouldn't be too difficult to fix. The problem before was that Parquet didn't have a UUID type annotation so there was no way to encode that a fixed was a UUID and correctly infer the schema. Now that this has been fixed, we could start using it. The larger issue now is that no processing engines can create a UUID type.

I'm still leaning toward removing it from the spec because there is no support for it in engines. I think a better way to do what this was attempting to is to add support in processing engines first, then support it in Iceberg if we need to.

JingsongLi commented 4 years ago

+1 to remove UUID Type now. Users can choose to use FIX_BYTES or STRING.

electrum commented 4 years ago

Presto does have a first class UUID type: https://prestosql.io/docs/current/language/types.html#uuid

It would be straightforward to add this in the Presto Iceberg connector if we want to support it.

github-actions[bot] commented 7 months ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

electrum commented 7 months ago

Trino supports UUID in Iceberg.

github-actions[bot] commented 3 weeks ago

This issue has been automatically marked as stale because it has been open for 180 days with no activity. It will be closed in next 14 days if no further activity occurs. To permanently prevent this issue from being considered stale, add the label 'not-stale', but commenting on the issue is preferred when possible.

github-actions[bot] commented 1 day ago

This issue has been closed because it has not received any activity in the last 14 days since being marked as 'stale'