apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
402 stars 147 forks source link

Error: `table_type` missing from table parameters when loading table from Hive metastore #1150

Open edgarrmondragon opened 3 weeks ago

edgarrmondragon commented 3 weeks ago

Apache Iceberg version

main (development)

Please describe the bug 🐞

I'm (a user of tap-iceberg is) running into the following error when trying to load a Hive table using pyiceberg.

pyiceberg.exceptions.NoSuchPropertyException: Property table_type missing, could not determine type: bronze.my_iceberg_table

The call in question is https://github.com/shaped-ai/tap-iceberg/blob/38064b3aaca5394ba1482970e790d3e2f6020946/tap_iceberg/tap.py#L94.

It seems the loaded table is missing the table type parameter in

https://github.com/apache/iceberg-python/blob/d587e6724685744918ecf192724437182ad01abf/pyiceberg/catalog/hive.py#L329-L331

?

Thanks in advance if this turns out to be user error 😃

kevinjqliu commented 3 weeks ago

In load_table, there's a 2 step process. First it fetches from HMS using get_table, then it converts the hive table into iceberg (_convert_hive_into_iceberg).

https://github.com/apache/iceberg-python/blob/d587e6724685744918ecf192724437182ad01abf/pyiceberg/catalog/hive.py#L524-L527

The error here is the 2nd step. It is expected that the hive table has a property "table_type" and maps to the string "iceberg".

https://github.com/apache/iceberg-python/blob/d587e6724685744918ecf192724437182ad01abf/pyiceberg/catalog/hive.py#L274-L285

Who created the table in this case? When PyIceberg creates the table, it injects the table_type property https://github.com/apache/iceberg-python/blob/d587e6724685744918ecf192724437182ad01abf/pyiceberg/catalog/hive.py#L373

edgarrmondragon commented 3 weeks ago

Who created the table in this case? When PyIceberg creates the table, it injects the table_type property

I suppose it was created by a third-party and not by HiveCatalog.create_table. Are only tables created by pyiceberg supported here?

kevinjqliu commented 3 weeks ago

Are only tables created by pyiceberg supported here?

Anyone can create an iceberg table using HMS, which can be read by PyIceberg. In HMS, the assumption is that iceberg tables have a specific property set so that engines can distinguish between hive and iceberg tables.

In this case, the table was created as a "hive table" and not an "iceberg table".

edgarrmondragon commented 3 weeks ago

Anyone can create an iceberg table using HMS, which can be read by PyIceberg. In HMS, the assumption is that iceberg tables have a specific property set so that engines can distinguish between hive and iceberg tables.

In this case, the table was created as a "hive table" and not an "iceberg table".

@kevinjqliu thanks for the info 🙏. Just two more questions:

  1. is there a way to set this property manually?
  2. would doing that break something?
kevinjqliu commented 3 weeks ago

is there a way to set this property manually?

You can use an engine (like Spark/Trino) to interact with the Hive table to add the extra table parameter. Alternatively, a hacky way is to use hive client in pyiceberg. Like so https://github.com/apache/iceberg-python/blob/d587e6724685744918ecf192724437182ad01abf/pyiceberg/catalog/hive.py#L570-L574 it should work, but definitely test it out first

would doing that break something?

Nope, adding that specific parameter to HMS is how the iceberg table is defined. You can see example in the core iceberg library and in an engine like Trino