Closed gubertoli closed 2 months ago
The argument for run_federated_server
is n_workers, port
instead of port, n_workers
. This is to be consistent with the new FederatedTracker
and RabitTracker
.
The argument for
run_federated_server
isn_workers, port
instead ofport, n_workers
. This is to be consistent with the newFederatedTracker
andRabitTracker
.
I did the change on run_federated_server
with n_workers first and then port as input arguments:
if with_ssl:
xgboost.federated.run_federated_server(world_size, port, SERVER_KEY, SERVER_CERT,
CLIENT_CERT)
else:
xgboost.federated.run_federated_server(world_size, port)
This is the current output error (tested on 2.1.0 and 2.1.1):
./runtests-federated.sh 5
[19:26:33] Insecure federated server listening on 0.0.0.0:9091, world size 5
[19:26:34] Rank 0
Process Process-2:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../fl-xgb-nids/refs/test_federated.py", line 36, in run_worker
with xgb.collective.CommunicatorContext(**communicator_env):
File ".../.venv/lib/python3.10/site-packages/xgboost/collective.py", line 280, in __enter__
assert is_distributed()
AssertionError
[19:26:34] Rank 0
Process Process-3:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../refs/test_federated.py", line 36, in run_worker
with xgb.collective.CommunicatorContext(**communicator_env):
File ".../.venv/lib/python3.10/site-packages/xgboost/collective.py", line 280, in __enter__
assert is_distributed()
AssertionError
[19:26:34] Rank 0
Process Process-4:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../refs/test_federated.py", line 36, in run_worker
with xgb.collective.CommunicatorContext(**communicator_env):
File ".../.venv/lib/python3.10/site-packages/xgboost/collective.py", line 280, in __enter__
assert is_distributed()
AssertionError
[19:26:34] Rank 0
Process Process-5:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../refs/test_federated.py", line 36, in run_worker
with xgb.collective.CommunicatorContext(**communicator_env):
File ".../.venv/lib/python3.10/site-packages/xgboost/collective.py", line 280, in __enter__
assert is_distributed()
AssertionError
[19:26:34] Rank 0
Process Process-6:
Traceback (most recent call last):
File "/usr/lib/python3.10/multiprocessing/process.py", line 314, in _bootstrap
self.run()
File "/usr/lib/python3.10/multiprocessing/process.py", line 108, in run
self._target(*self._args, **self._kwargs)
File ".../refs/test_federated.py", line 36, in run_worker
with xgb.collective.CommunicatorContext(**communicator_env):
File ".../.venv/lib/python3.10/site-packages/xgboost/collective.py", line 280, in __enter__
assert is_distributed()
AssertionError
Hi, it's dmlc_communicator
instead of xgboost_communicator
. Apologies for the confusion, the document is still sparse at the moment as we are still working on the feature. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/testing/federated.py can be a starting point.
Hi, it's
dmlc_communicator
instead ofxgboost_communicator
. Apologies for the confusion, the document is still sparse at the moment as we are still working on the feature. https://github.com/dmlc/xgboost/blob/master/python-package/xgboost/testing/federated.py can be a starting point.
Using dmlc_communicator
instead of xgboost_communicator
solved the issue. Thank you for the help! I will follow the starting point now on.
This issue is a follow-up of #10500 and PR #10503.
Output with XGBoost 2.1.1:
For reference, the adapted
test_federated.py
is:And the adapted shell script
runtests-federated.sh
:🚧 ⌛ Interim solution: Downgrade to XGBoost 2.0.0