dask / dask-xgboost

BSD 3-Clause "New" or "Revised" License
162 stars 43 forks source link

Passing scheduler host to start_tracker #77

Closed mmccarty closed 3 years ago

mmccarty commented 3 years ago

This was causing issues with our internal network. Please let me know if there is a more desirable fix.

gforsyth commented 3 years ago

I'm not certain we want to do this / hard-code it -- if a user is using the dask helm chart then the client.scheduler.address will resolve to the reverse-proxy service instead of the scheduler pod. And rabit can't handle reverse proxies (at least as of xgboost 1.2)

mmccarty commented 3 years ago

Good catch! Is there any way we can test for this? I'll think of a another approach.

mmccarty commented 3 years ago

@gforsyth CI was flaky but looks like it is passing now. What do you think?

mmccarty commented 3 years ago

hey @TomAugspurger I'm getting a lot of random seg faults in CI. Sorry to bug you. Do you know what's going on?

TomAugspurger commented 3 years ago

No idea. I've restarted the job in hopes that it goes away.

TomAugspurger commented 3 years ago

All green now.

mmccarty commented 3 years ago

Thanks!