apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
402 stars 147 forks source link

PyIceberg is not respecting `token` in the load table response #1113

Open creechy opened 1 month ago

creechy commented 1 month ago

Apache Iceberg version

0.7.1 (latest release)

Please describe the bug 🐞

In the Iceberg REST API spec, the load table endpoint can include a config map with additional properties to configure for when accessing the given table. One of these a potential token property in the LoadTableResult in a section which states

The following configurations should be respected by clients

It doesn't appear that PyIceberg is respecting this property and continues to use the original token supplied in the catalog configuration request. This can lead to incorrect permissions being applied for table operations which in some cases could prevent operations from succeeding when they should.

kevinjqliu commented 1 month ago

Thanks for reporting this @creechy Do you have an example to reproduce this issue?

For now, I found some more docs on token https://github.com/apache/iceberg/blob/cd32ec76ecd2866c05185065e4ed7196121de49a/open-api/rest-catalog-open-api.yaml#L672-L675

And relevant code in the REST client implementation https://github.com/apache/iceberg-python/blob/e4c1748fee220076f04e35ab2f182dd51ca20705/pyiceberg/catalog/rest.py#L506-L515

creechy commented 1 month ago

@kevinjqliu

Do you have an example to reproduce this issue?

I provided an example with a Tabular config to @Fokko, who confirmed that PyIceberg does not appear to be switching tokens. FWIW here's the script, probably not all that useful, but if you know how to look at the response of load_table, you should see a config object with a token in it, which PyIceberg is not using. In my case, the original token does not have the necessary privileges to update the table, and so this ends up with a 409 Conflict instead of succeeding.


catalog = load_catalog(
    "raw",
    **{
        "type": "rest",
        "uri": "https://api.tabular.io/ws/",
        "warehouse": "<warehouse>",
        "credential": "<credential>"
    },
)

table = catalog.load_table("default.buddy")

schema = table.schema().as_arrow()

df = pa.Table.from_pylist(
    [{"s": "Groningen"}], schema=schema
)

table.append(df)