dask / distributed

A distributed task scheduler for Dask
https://distributed.dask.org
BSD 3-Clause "New" or "Revised" License
1.55k stars 712 forks source link

Fix pynvml handles #8693

Open quasiben opened 2 weeks ago

quasiben commented 2 weeks ago

Replaces https://github.com/dask/distributed/pull/8419

cc @fjetter @hendrikmakait @rjzamora

I've tested this on machine with several A100s. @jacobtomlinson is it possible for you to test this PR in a MIG setup ?

github-actions[bot] commented 2 weeks ago

Unit Test Results

_See test report for an extended history of previous test failures. This is useful for diagnosing flaky tests._

    29 files  ± 0      29 suites  ±0   11h 4m 19s :stopwatch: - 27m 20s  4 064 tests + 4   3 958 :white_check_mark: +3    101 :zzz: + 4  5 :x:  - 1  55 941 runs  +15  53 756 :white_check_mark: +4  2 178 :zzz: +15  7 :x:  - 2 

For more details on these failures, see this check.

Results for commit 4e963b8d. ± Comparison against base commit af237f0d.

jacobtomlinson commented 1 week ago

I can, but with PyData London tomorrow I'm not going to get to this immediately.

hendrikmakait commented 2 days ago

I've tested this on machine with several A100s. @jacobtomlinson is it possible for you to test this PR in a MIG setup ?

@jacobtomlinson: Gentle ping :)

jacobtomlinson commented 2 days ago

Apologies I've been struggling to get my hands on the right hardware to test this out. Let me ping NVIDIA folks again.

hendrikmakait commented 2 days ago

Apologies I've been struggling to get my hands on the right hardware to test this out. Let me ping NVIDIA folks again.

No worries, I just wanted to make sure this didn't get buried in the post-conference catch-up.