Describe the bug
When running nvflare poc prepare -i project.yml, the builders.args.overseer_agent.args.sp_end_point value for a DummyOverseerAgent is not reflected in the provisioned fed_server.json, fed_client.json, fed_admin.json files. This means that even if you change the admin and fed_learn ports in the project yaml and endpoint, the POC processes still try connecting to the default 8003/8002 ports.
To Reproduce
Steps to reproduce the behavior:
Create a project.yml based off of the default POC config, but change the admin and fed_learn ports to 8005 and 8004:
Run nvflare poc prepare -i project.yml with that file
Go to the provisioned file poc/example_project/prod_00/server/startup/fed_server.json and notice that the target and admin ports are properly set to 8004 and 8005, but the overseer_agent args still use sp_end_point: "localhost:8002:8003".
The same overseer_agent sp_end_point can be seen in admin/startup/fed_admin.json or site/startup/fed_client.json
If you continue to launch the POC nvflare poc start, the participants will try and fail to connect over the old 8002/8003 ports. This leads to a login error and the following logs:
# nvflare poc start
WORKSPACE set to /Users/paddison/repos/FedRAG/outputs/poc/example_project/prod_00/server/startup/..
PYTHONPATH is /local/custom:
WORKSPACE set to /Users/paddison/repos/FedRAG/outputs/poc/example_project/prod_00/site-1/startup/..
PYTHONPATH is /local/custom:
WORKSPACE set to /Users/paddison/repos/FedRAG/outputs/poc/example_project/prod_00/site-2/startup/..
PYTHONPATH is /local/custom:
start fl because of no pid.fl
new pid 34115
Trying to obtain server address
Obtained server address: localhost:8003
Trying to login, please wait ...
start fl because of no pid.fl
new pid 34133
2024-08-09 11:34:46,011 - nvflare.private.fed.app.deployer.server_deployer.ServerDeployer - INFO - server heartbeat timeout set to 600
2024-08-09 11:34:46,155 - CoreCell - INFO - server: creating listener on grpc://0:8004
2024-08-09 11:34:46,186 - CoreCell - INFO - server: created backbone external listener for grpc://0:8004
2024-08-09 11:34:46,187 - ConnectorManager - INFO - 34115: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2024-08-09 11:34:46,188 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:11825] is starting
start fl because of no pid.fl
new pid 34142
Trying to login, please wait ...
Waiting for SP....
2024-08-09 11:34:46,693 - CoreCell - INFO - server: created backbone internal listener for tcp://localhost:11825
2024-08-09 11:34:46,693 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 PASSIVE grpc://0:8004] is starting
2024-08-09 11:34:46,694 - nvflare.private.fed.app.deployer.server_deployer.ServerDeployer - INFO - deployed FLARE Server.
2024-08-09 11:34:46,706 - nvflare.fuel.hci.server.hci - INFO - Starting Admin Server localhost on Port 8005
2024-08-09 11:34:46,706 - root - INFO - Server started
2024-08-09 11:34:46,709 - nvflare.fuel.f3.drivers.grpc_driver.Server - INFO - added secure port at 0.0.0.0:8004
2024-08-09 11:34:46,909 - CoreCell - INFO - site-1: created backbone external connector to grpc://localhost:8002
2024-08-09 11:34:46,909 - ConnectorManager - INFO - 34133: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2024-08-09 11:34:46,912 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:25585] is starting
2024-08-09 11:34:47,415 - CoreCell - INFO - site-1: created backbone internal listener for tcp://localhost:25585
2024-08-09 11:34:47,416 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://localhost:8002] is starting
2024-08-09 11:34:47,416 - FederatedClient - INFO - Wait for engine to be created.
2024-08-09 11:34:47,424 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at localhost:8002
2024-08-09 11:34:47,424 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 N/A => localhost:8002] is created: PID: 34133
2024-08-09 11:34:47,434 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 Not Connected] is closed PID: 34133
2024-08-09 11:34:47,434 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - CLIENT: finished connection [CN00002 Not Connected]
Waiting for SP....
Expected behavior
I would like to be able to change the POC overseer ports so that I can have developers running multiple POCs on the same machine using different non-conflicting ports based on having separate project yamls.
Screenshots
See files/logs pasted above.
Desktop (please complete the following information):
Describe the bug When running
nvflare poc prepare -i project.yml
, the builders.args.overseer_agent.args.sp_end_point value for a DummyOverseerAgent is not reflected in the provisioned fed_server.json, fed_client.json, fed_admin.json files. This means that even if you change the admin and fed_learn ports in the project yaml and endpoint, the POC processes still try connecting to the default 8003/8002 ports.To Reproduce Steps to reproduce the behavior:
nvflare poc prepare -i project.yml
with that filenvflare poc start
, the participants will try and fail to connect over the old 8002/8003 ports. This leads to a login error and the following logs:Expected behavior I would like to be able to change the POC overseer ports so that I can have developers running multiple POCs on the same machine using different non-conflicting ports based on having separate project yamls.
Screenshots See files/logs pasted above.
Desktop (please complete the following information):
Additional context N/A