Open andygrove opened 3 years ago
I am wondering if the current 2
should just be based on a configation setting for the default number of partitions (just like Spark uses 200 partitions as a default).
Maybe we should clean up the terminology wrt concurrency and number of partitions a bit in that case
:100: That sounds great to me.
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
By default, DataFusion uses hash-partitioned joins if concurrency > 1 which led to me adding this hacky code in a couple of places in Ballista.
Describe the solution you'd like I'm actually not sure what the solution should be, but I would like to be able to tell the context to use hash-partitioned joins, separately from specifying concurrency.
Describe alternatives you've considered None
Additional context This code is running in the scheduler, not in the executor where the query actually executes. The scheduler concurrency should not impact how the query is planned.