Open soxofaan opened 3 months ago
Are you suggesting something like this:
diff --git a/python/pyarrow/dataset.py b/python/pyarrow/dataset.py
index 1efbfe1..9afb3fe 100644
--- a/python/pyarrow/dataset.py
+++ b/python/pyarrow/dataset.py
@@ -118,7 +118,7 @@ def __getattr__(name):
)
-def partitioning(schema=None, field_names=None, flavor=None,
+def partitioning(schema=None, field_names=None, flavor="directory",
dictionaries=None):
"""
Specify a partitioning scheme.
@@ -220,7 +220,7 @@ def partitioning(schema=None, field_names=None, flavor=None,
>>> part = ds.partitioning(flavor="hive")
"""
- if flavor is None:
+ if flavor is None or flavor == "directory":
# default flavor
if schema is not None:
if field_names is not None:
Plus the other related functionality changes. Being explicit sounds sensible to me, CC @jorisvandenbossche
indeed, that would be the core of my feature request
Describe the enhancement requested
pyarrow.dataset.partitioning(... flavor...)
supports threeflavor
values:So to choose DirectoryPartitioning one has to specify
None
, which does not feel very future proof (e.g. also see #30888 and #30889 ) and lacks the explicitness and self-documenting properties of the other options ("filename" and "hive").Wouldn't it be better to support "directory" as a flavor option and make this the default.
This also applies to some related functionality like
pyarrow.dataset.write_dataset(...partitioning_flavor...)
andpyarrow.dataset.dataset(...partitioning...)
Component(s)
Python