Dataframes can be repartitioned by a number before being written out to a FileSystem when using the commit blocks.
Actual Behaviour
If a repartition is done on the Dataframe before it is passed to the commit action, it is sometimes ignored as it can be cached as Parquet if the label is reused in the flow. Currently the ParquetDataCommiter API only allows named partitions.
Expected Behavior
Dataframes can be repartitioned by a number before being written out to a FileSystem when using the commit blocks.
Actual Behaviour
If a repartition is done on the Dataframe before it is passed to the commit action, it is sometimes ignored as it can be cached as Parquet if the label is reused in the flow. Currently the ParquetDataCommiter API only allows named partitions.