coiled / benchmarks

BSD 3-Clause "New" or "Revised" License
28 stars 17 forks source link

optuna is failing #1487

Open crusaderky opened 6 months ago

crusaderky commented 6 months ago

tests/workflows/test_pytorch_optuna.py::test_hpo is failing with an obscure CommClosedError / CancelledError:

https://github.com/coiled/benchmarks/actions/runs/8362787642/job/22894238513

as this test only runs by enabling it explicitly, I'm unsure when it started failing. I've recently upgraded to optuna 3.6.0 but the issue was already showing in 3.5.0.

FYI @jrbourbeau

jrbourbeau commented 6 months ago

I'm not sure exactly what's going on here, but things are failing here

https://github.com/coiled/benchmarks/blob/cc6cf17862cb9186d72e599ccb1471c8b3139327/tests/conftest.py#L618

during test setup. It's not clear to me that this is related to optuna -- seems more like a cluster deployment issue (again, I've only looked into this a little)