apache / iceberg-python

Apache PyIceberg
https://py.iceberg.apache.org/
Apache License 2.0
565 stars 209 forks source link

Documentation for Location Providers #1510

Closed smaheshwar-pltr closed 17 hours ago

smaheshwar-pltr commented 1 week ago

Feature Request / Improvement

Following #1452, we want to add documentation for the new location providers; it introduces a new PyIceberg-specific table property, and location providers are user-specifiable. Quoting https://github.com/apache/iceberg-python/pull/1452#pullrequestreview-2544129142:

I think we should document:

LocationProvider
    SimpleLocationProvider
    ObjectStoreLocationProvider
Loading a Custom LocationProvider

And new table properties:

WRITE_PY_LOCATION_PROVIDER_IMPL = "write.py-location-provider.impl"

OBJECT_STORE_ENABLED = "write.object-storage.enabled"
OBJECT_STORE_ENABLED_DEFAULT = False

WRITE_OBJECT_STORE_PARTITIONED_PATHS = "write.object-storage.partitioned-paths"
WRITE_OBJECT_STORE_PARTITIONED_PATHS_DEFAULT = True

(Although to note that some defaults might change in #1509)

smaheshwar-pltr commented 1 week ago

Happy to pick this up, just dropping an issue for now.

I'd like to wait for https://github.com/apache/iceberg-python/pull/1509 to merge before merging this (can still start on the docs) so we have an idea of defaults.

There's an argument to wait for https://github.com/apache/iceberg-python/issues/1492 because write.data.path does change the code significantly. But IMHO, if we keep the docs fairly high-level similar to the Java-side (which I think we should anyway), this won't be an issue and they won't need to be changed. Update: See https://github.com/apache/iceberg-python/pull/1537#discussion_r1921100230.