dask / community

For general discussion and community planning. Discussion issues welcome.
20 stars 3 forks source link

Release 2021.11.0 #197

Closed jrbourbeau closed 2 years ago

jrbourbeau commented 2 years ago

As part of our normal release cadence, if there are no known blockers, I'd like to release dask and distributed 2021.11.0.

Looking at some recent issues / PRs, it would be good to get a patch out which fixes https://github.com/dask/distributed/issues/5472.

cc @jakirkham @jsignell @quasiben

EDIT: I forgot to mention there has been one reported regression from the 2021.10.0 release (xref https://github.com/dask/dask/issues/8292) which it would also be good to get a patch out for

jakirkham commented 2 years ago

Thanks James! 😀 Will mention internally

EDIT: I forgot to mention there has been one reported regression from the 2021.10.0 release (xref https://github.com/dask/dask/issues/8292) which it would also be good to get a patch out for

Do we know how to fix this issue? Last I checked the cause wasn’t well understood. Has that changed?

jrbourbeau commented 2 years ago

Do we know how to fix this issue? Last I checked the cause wasn’t well understood. Has that changed?

My guess is, as @gjoseph92 mentioned https://github.com/dask/dask/pull/8174#discussion_r736715570, there's some subtle issue in our high-level graph code, but to my knowledge nobody has been able to investigate yet. To be clear, I don't think this should block releasing, I was just being hopeful about a patch is all : )

Also @jcrist has a fix for https://github.com/dask/distributed/issues/5472 over in https://github.com/dask/distributed/pull/5488

Any issues on the RAPIDS side, or are we okay to release as usual tomorrow?

jakirkham commented 2 years ago

Oops forgot to raise this 🤦‍♂️ Have mentioned it now. Will let you know if we’ve heard anything back by the morning (US Pacific)

chrisroat commented 2 years ago

What is the policy is on regressions? Is there any worry that a subtle bug is causing more issues than realized?

I currently put my time into trying to become a scheduler aficionado (since I spend my time killing deadlocked workers). I can also start learning high level graph if this particular regression is low priority, since for me it's a graph at the heart of our pipeline.

gjoseph92 commented 2 years ago

We also have a new deadlock in distributed: https://github.com/dask/distributed/issues/5480 (though it's almost certainly been around for a while already). Both myself and another user in the wild have triggered this through normal use. https://github.com/dask/distributed/pull/5457 is a partial fix, but idk if we'll get it in by tomorrow? cc @fjetter

Since I don't think it's a recent regression (possibly worker state machine refactor), I don't know if we want to block this release for it.

jrbourbeau commented 2 years ago

What is the policy is on regressions? Is there any worry that a subtle bug is causing more issues than realized?

That's a great question. We don't have a hard policy on regressions. Historically we've tried our best to fix regressions as they're reported or estimate how impactful the regression is based on user feedback (this is really hard to do). For this case, one option would be to just revert https://github.com/dask/dask/pull/8174 until we're able to get to the bottom of the graph validation issue you raised (xref https://github.com/dask/dask/issues/8292).

gjoseph92 commented 2 years ago

FYI, on the topic of regressions... https://github.com/microsoft/LightGBM/issues/4771

jsignell commented 2 years ago

I just ran into a regression with reading from parquet https://github.com/dask/dask/issues/8349

jakirkham commented 2 years ago

Will let you know if we’ve heard anything back by the morning (US Pacific)

The only thing I've heard about is PR ( https://github.com/dask/distributed/pull/5380 ), which is now in. So no blockers from us.

jrbourbeau commented 2 years ago

Thanks @gjoseph92 @jsignell for surfacing those regressions

I was in an unrelated meeting with @fjetter and @gjoseph92 where this release came up and I wanted to surface the result of that conversation. It seems like there are a few known regressions from the 2021.10.0 release. The deadlock issue is, in part, related to the large worker state refactor (there's a partial fix for this in the works, but it won't be ready for today). The parquet regression is certainly valid though, as @jsignell points out https://github.com/dask/dask/issues/8349#issuecomment-961978799, is somewhat of an edge case.

Looking at the commits to dask and distributed since the last release, nothing stands out as particular controversial (there's definitely nothing like the big worker state machine refactor in the last release) but there is a fix for https://github.com/dask/distributed/issues/5472, which several users reported running into and would be good to fix. All together my sense is that we should still release today. We won't be any worse off from a deadlock perspective than what's already released, and we'll fix a high-ish profile issue (https://github.com/dask/distributed/issues/5472). We should still invest in fixing https://github.com/dask/dask/issues/8349 and https://github.com/dask/dask/issues/8292, but again I don't think we'll be any worse off by releasing today.

Thoughts?

jrbourbeau commented 2 years ago

Planning to carry on with releasing in a bit if no further comments

jakirkham commented 2 years ago

SGTM. Thanks for meeting with people and surfacing that info, James 😄

rjzamora commented 2 years ago

I just ran into a regression with reading from parquet dask/dask#8349

dask#8351 should close this

jakirkham commented 2 years ago

Rick's PR is now in

jakirkham commented 2 years ago

FYI, on the topic of regressions... microsoft/LightGBM#4771

This traces back to issue ( https://github.com/dask/distributed/issues/5497 ). Thanks for tracking that down Gabe 😄

jrbourbeau commented 2 years ago

Rick's PR is now in

Sorry for the delay, I was in meetings. Will start pushing out the release now...