Calculate partition boundaries for `read_sql` using sampling

Is your feature request related to a problem?

Currently, read_sql calculates partition boundaries using PERCENTILE_DISC, falling back to using min - max if needed. PERCENTILE_DISC is not scalable, as it involves expensive operations such as windowing and sorting.

Describe the solution you'd like

We should instead calculate percentiles by taking samples from the input table. This will allows trade off btw accuracy and computational complexity (both time and space complexities). The parameter to control this trade off is the sampling size. The large the sampling size, the more accurate, but it will be more computational complex. If smaller sampling size will be more likely to have uneven sized partition (skew).

Eventual-Inc / Daft

Calculate partition boundaries for `read_sql` using sampling #3245

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional Context

Would you like to implement a fix?