dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.58k stars 719 forks source link

Ensure ConnectionPool closes even if network stack swallows cancellation #8928

Closed fjetter closed 2 weeks ago

fjetter commented 2 weeks ago

we've seen some workers not closing properly in cases where the scheduler died very unusual deaths. We saw that workers, particularly Nannies, wouldn't close for whatever reason without any sound.

For this section in the code it is at least in theory possible to lock up in case the connector (in this case tornado which I generally trust to do these things properly) swallows cancellation attempts completely.

I don't have confidence that this is actually related but in case I'm missing something, let's remove that edge case.

github-actions[bot] commented 2 weeks ago

Unit Test Results

_See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests._

    25 files  ± 0      25 suites  ±0   10h 16m 57s ⏱️ - 1m 52s  4 129 tests + 1   4 014 ✅ + 1    110 💤 ±0  5 ❌ ±0  47 680 runs  +12  45 562 ✅ +13  2 113 💤 ±0  5 ❌  - 1 

For more details on these failures, see this check.

Results for commit 4d579a70. ± Comparison against base commit c38c509e.