facebookincubator / gloo

Collective communications library with various primitives for multi-machine training.
Other
1.23k stars 303 forks source link

Allow ports to be reused in gloo #353

Open H-Huang opened 1 year ago

H-Huang commented 1 year ago

Summary: ProcessGroupGloo and gloo seem to be opening and closing sockets without allowing the port to be reused. We see this issue pop up in larger training jobs "Address already in use" and we assume it to be because all the ephemeral ports are exhausted.

This diff allows ports to be reused, we see a reduced number of ports being in TIME_WAIT state.

context: https://fb.workplace.com/groups/319878845696681/permalink/5988899781205532/

another issue: https://fb.workplace.com/groups/319878845696681/permalink/958768178474408/

Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927

facebook-github-bot commented 1 year ago

This pull request was exported from Phabricator. Differential Revision: D44029927