Azure / medical-imaging

ML-based medical imaging using Azure
MIT License
118 stars 73 forks source link

No module named 'nvflare.lighter.impl.auth_policy' #16

Open hwpang opened 3 months ago

hwpang commented 3 months ago

Hi,

Thanks for the great demo! I am following the instructions for the federated learning at https://github.com/hwpang/medical-imaging/blob/main/federated-learning/README.md.

I was able to follow through until provision -p project.yml step, where I encountered the following error:

Project yaml file: /home/runner/medical-imaging/federated-learning/project.yml.

Unable to handle command: provision due to: No module named 'nvflare.lighter.impl.auth_policy' 

sub_parser is: ArgumentParser(prog='nvflare provision', usage=None, description=None, formatter_class=<class 'argparse.HelpFormatter'>, conflict_handler='error', add_help=True)
usage: nvflare provision [-h] [-p PROJECT_FILE] [-w WORKSPACE] [-c CUSTOM_FOLDER] [--add_user ADD_USER] [--add_client ADD_CLIENT]

options:
  -h, --help            show this help message and exit
  -p PROJECT_FILE, --project_file PROJECT_FILE
                        file to describe FL project
  -w WORKSPACE, --workspace WORKSPACE
                        directory used by provision
  -c CUSTOM_FOLDER, --custom_folder CUSTOM_FOLDER
                        additional folder to load python codes
  --add_user ADD_USER   yaml file for added user
  --add_client ADD_CLIENT
                        yaml file for added client

I would appreciate any advice on how to modify the config file to make it work for newer version of NVFLARE. Thanks!

Relevant information:

hwpang commented 3 months ago

I was able to get this to work after swapping it to the default provision file generated by the newest nvflare. However, I encounter other problem. I am following step 4 in the federated learning demo to prepare clients: https://github.com/hwpang/medical-imaging/blob/main/federated-learning/README.md#4-prepare-clients. I run into the following error. Would appreciate any advice on how to resolve this.

PYTHONPATH is /local/custom:
start fl because of no pid.fl
new pid 7883
Waiting for SP....
2024-08-20 18:10:01,351 - CoreCell - INFO - FL-Asia-Hospital: created backbone external connector to grpc://server1:8002
2024-08-20 18:10:01,354 - ConnectorManager - INFO - 7883: Try start_listener Listener resources: {'secure': False, 'host': 'localhost'}
2024-08-20 18:10:01,377 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00002 PASSIVE tcp://0:55919] is starting
2024-08-20 18:10:01,706 - Communicator - INFO - Waiting for the client cell to be created.
2024-08-20 18:10:01,886 - CoreCell - INFO - FL-Asia-Hospital: created backbone internal listener for tcp://localhost:55919
2024-08-20 18:10:01,893 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connector [CH00001 ACTIVE grpc://server1:8002] is starting
2024-08-20 18:10:01,899 - FederatedClient - INFO - Wait for engine to be created.
2024-08-20 18:10:01,905 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at server1:8002
2024-08-20 18:10:01,912 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 N/A => server1:8002] is created: PID: 7883
2024-08-20 18:10:01,972 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00002 Not Connected] is closed PID: 7883
2024-08-20 18:10:01,979 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - CLIENT: finished connection [CN00002 Not Connected]
2024-08-20 18:10:03,018 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at server1:8002
2024-08-20 18:10:03,024 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00003 N/A => server1:8002] is created: PID: 7883
2024-08-20 18:10:03,081 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00003 Not Connected] is closed PID: 7883
2024-08-20 18:10:03,088 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - CLIENT: finished connection [CN00003 Not Connected]
2024-08-20 18:10:05,128 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at server1:8002
2024-08-20 18:10:05,135 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00004 N/A => server1:8002] is created: PID: 7883
2024-08-20 18:10:05,196 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00004 Not Connected] is closed PID: 7883
2024-08-20 18:10:05,202 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - CLIENT: finished connection [CN00004 Not Connected]
2024-08-20 18:10:09,267 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - created secure channel at server1:8002
2024-08-20 18:10:09,273 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00005 N/A => server1:8002] is created: PID: 7883
2024-08-20 18:10:09,286 - nvflare.fuel.f3.sfm.conn_manager - INFO - Connection [CN00005 Not Connected] is closed PID: 7883
2024-08-20 18:10:09,292 - nvflare.fuel.f3.drivers.grpc_driver.GrpcDriver - INFO - CLIENT: finished connection [CN00005 Not Connected]
2024-08-20 18:10:09,298 - nvflare.fuel.f3.sfm.conn_manager - INFO - Retrying [CH00001 ACTIVE grpc://server1:8002] in 8 seconds
Exception in thread Thread-1 (_rnq_worker):
Traceback (most recent call last):
  File "/anaconda/envs/nvflare_env/lib/python3.10/threading.py", line 1016, in _bootstrap_inner
    self.run()
  File "/anaconda/envs/nvflare_env/lib/python3.10/threading.py", line 953, in run
    self._target(*self._args, **self._kwargs)
  File "/home/azureuser/NVFlare/nvflare/ha/dummy_overseer_agent.py", line 112, in _rnq_worker
    self._do_callback()
  File "/home/azureuser/NVFlare/nvflare/ha/dummy_overseer_agent.py", line 106, in _do_callback
    self._update_callback(self)
  File "/home/azureuser/NVFlare/nvflare/private/fed/client/fed_client_base.py", line 147, in overseer_callback
    self.set_primary_sp(sp)
  File "/home/azureuser/NVFlare/nvflare/private/fed/client/fed_client_base.py", line 362, in set_primary_sp
    return self.set_sp(self._get_project_name(), sp)
  File "/home/azureuser/NVFlare/nvflare/private/fed/client/fed_client_base.py", line 162, in set_sp
    self._create_cell(location, scheme)
  File "/home/azureuser/NVFlare/nvflare/private/fed/client/fed_client_base.py", line 220, in _create_cell
    raise RuntimeError(f"Failed to get engine after {time.time()-start} seconds")
RuntimeError: Failed to get engine after 15.000278234481812 seconds