Closed jrbourbeau closed 1 year ago
@jrbourbeau in the coiled-benchmarks there were some regressions spotted, it's not clear to me if this was expected or not or if they were solved, see:
@j-bennet you were looking at these cases right, do you have any more context here that you can add?
@ncclementi @jrbourbeau
https://github.com/coiled/benchmarks/issues/839 didn't look legit. There seemed to have been a hiccup writing to benchmarks.db
, several records were duplicated on insert. I closed the issue.
However, CI opened a new one today:
https://github.com/coiled/benchmarks/issues/840
and that one may be legitimate, still investigating.
Ok, so in the new CI issue, runtime = 'coiled-upstream-py3.9'
regressions look legitimate:
runtime = 'coiled-upstream-py3.9', name = 'test_q8[0.5 GB (csv)-p2p]', category = 'benchmarks', last_three_duration [s] = ([21](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:22).08663511276245, 23.024844884872437, [22](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:23).656970739364624), duration_threshold [s] = 20.351763563082486
runtime = 'coiled-upstream-py3.9', name = 'test_q8[0.5 GB (csv)-tasks]', category = 'benchmarks', last_three_duration [s] = (20.7704176902771, [23](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:24).282063007354736, 21.78318214416504), duration_threshold [s] = 19.681029691350872
runtime = 'coiled-upstream-py3.9', name = 'test_q8[0.5 GB (parquet)-p2p]', category = 'benchmarks', last_three_duration [s] = (36.947147607803345, 38.3483505[24](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:25)902344, 37.531134366989136), duration_threshold [s] = 27.49540470443585
runtime = 'coiled-upstream-py3.9', name = 'test_q8[0.5 GB (parquet)-tasks]', category = 'benchmarks', last_three_duration [s] = (36.0034384727478, 37.55315279960632, 37.223448038101196), duration_threshold [s] = 26.97645565716312
runtime = 'coiled-upstream-py3.9', name = 'test_q8[5 GB (parquet)-p2p]', category = 'benchmarks', last_three_duration [s] = (184.075361[25](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:26)183105, 178.22539234161377, 177.2711117[26](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:27)76086), duration_threshold [s] = 144.0348346523647
The charts don't look very alarming to me. Zoomed in:
These spikes are similar to fluctuations we had in the past, and those resolved.
runtime = 'coiled-latest-py3.9'
are still the same duplicate record issue, not legitimate, at least not yet:
runtime = 'coiled-latest-py3.9', name = 'test_q8[0.5 GB (csv)-p2p]', category = 'benchmarks', last_three_duration [s] = (23.0051052570343, 23.257978677749634, 23.257978677749634), duration_threshold [s] = 22.74642871273649
runtime = 'coiled-latest-py3.9', name = 'test_q8[0.5 GB (csv)-tasks]', category = 'benchmarks', last_three_duration [s] = (22.663613319396973, 23.347721576690674, 23.347721576690674), duration_threshold [s] = 22.475621609149425
runtime = 'coiled-latest-py3.9', name = 'test_q8[5 GB (parquet)-p2p]', category = 'benchmarks', last_three_duration [s] = ([20](https://github.com/coiled/benchmarks/actions/runs/4931685685/jobs/8814800607#step:5:21)3.9737629890442, 190.76829409599304, 190.76829409599304), duration_threshold [s] = 187.39081849451378
@fjetter @hendrikmakait should these block the release, can you advise?
I'm investigating.
Thanks @hendrikmakait
The regression we see in the benchmarks is caused by a switch from pandas=1.5.3
to pandas=2.0.1
in the benchmarking environment, not a change since dask=2023.4.1
. I've run an A/B test (https://github.com/coiled/benchmarks/actions/runs/4946428740) on 2023.4.1
confirming that this issue is already present in the previous release.
I suggest moving forward with the release as planned.
Thanks for confirming @hendrikmakait. I'm happy to move forward with the release in this case.
The default value of group_keys
changed from False
to True
which caused the regression.
Starting the release now
dask
and distributed
2023.5.0
are now on PyPI. @charlesbluca is going to handle the bot-triggered actions on conda-forge and dask-docker.
Release on conda-forge is complete:
Currently handling the docker release:
Closing as complete
Thanks Jacob, Charles, and others for handling the release this week.
On Mon, May 15, 2023 at 8:03 AM Jacob Tomlinson @.***> wrote:
Closing as complete
— Reply to this email directly, view it on GitHub https://github.com/dask/community/issues/322#issuecomment-1547822073, or unsubscribe https://github.com/notifications/unsubscribe-auth/AACKZTADUVM6KCTUC4UXW6LXGISSLANCNFSM6AAAAAAX3QH72Y . You are receiving this because you are subscribed to this thread.Message ID: @.***>
@quasiben has noticed that https://docs.dask.org/en/stable/changelog.html has not been updated with the latest release.
I'll reopen this while we look into it.
It looks like the docs build failed as it couldn't find the 2023.5.0
release on PyPI. This was likely a race between me pushing the tag to GitHub and pushing the release to PyPI.
I don't see any obvious button in RTD to re-run the build. @jrbourbeau have you run into this before?
You should be able o run a RTD build any time by going to the project page and going to "builds".
Ah yeah, thanks @martindurant. I was looking for a "rerun" button on the failed builds. I've triggered a new build for stable
and latest
.
Ah found another problem. I missed a user link in the changelog. @jrbourbeau did warn me about this. I'll get it resolved now.
This is now resolved. Apologies for the noise.
Thanks @jacobtomlinson @charlesbluca for handling this release!
Best effort
Try to close before the release but will not block the release
Blocker
Issues that would cause us to block and postpone the release if not fixed
Comments
Note that @jacobtomlinson and @charlesbluca will be handling the release this week as I'll be OOO (thanks again for taking care of this)
cc @quasiben @rjzamora @fjetter