dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.56k stars 712 forks source link

Shuffle use pyarrow more broadly #8596

Open fjetter opened 3 months ago

fjetter commented 3 months ago

These are a couple of very minor fixes that in my very limited small scale testing turned out to speed up things.

I can speak at least to the unique computation that I saw this flare up in profiles as well and benchmarking this on toy examples shows that this is about 20x faster than on main but that depends of course on the kind of data so in general this is likely not as impactful

github-actions[bot] commented 3 months ago

Unit Test Results

_See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests._

    29 files  ± 0      29 suites  ±0   11h 19m 23s :stopwatch: + 7m 11s  4 055 tests ± 0   3 689 :white_check_mark:  - 246    109 :zzz: ±0  256 :x: +245  1 :fire: +1  54 889 runs  +19  52 142 :white_check_mark:  - 230  2 410 :zzz:  - 5  336 :x: +253  1 :fire: +1 

For more details on these failures and errors, see this check.

Results for commit 9826d68a. ± Comparison against base commit 8927bfd0.

:recycle: This comment has been updated with latest results.