When a dataframe has large number of small partitions, for group-map operations, distributed backends are not always efficient because there is too much communication overhead between the customer logic and the backends.
So we should have a way to partition in a less granular way: for each user defined group it is in one and only one partition, but for each partition it can contain multiple groups.
With coarse partitioning, we push part of the partitioning responsibility to local computing frameworks such as pandas, arrow and polars,. This in some cases can be significantly more efficient.
When a dataframe has large number of small partitions, for group-map operations, distributed backends are not always efficient because there is too much communication overhead between the customer logic and the backends.
So we should have a way to partition in a less granular way: for each user defined group it is in one and only one partition, but for each partition it can contain multiple groups.
With coarse partitioning, we push part of the partitioning responsibility to local computing frameworks such as pandas, arrow and polars,. This in some cases can be significantly more efficient.