Open PaulKMandal opened 4 months ago
Hi @PaulKMandal, are the server plus and (at least) two clients running?
I'd strongly recommend upgrading to the latest Flower release (1.7 as of today). I had to check the release log to be sure, Flower 0.17.0 was released in 2021.
This guide helps with upgrading from pre-1.0 to 1.0+ releases: https://flower.ai/docs/framework/how-to-upgrade-to-flower-1.0.html
Hi @PaulKMandal, are the server plus and (at least) two clients running?
I'd strongly recommend upgrading to the latest Flower release (1.7 as of today). I had to check the release log to be sure, Flower 0.17.0 was released in 2021.
This guide helps with upgrading from pre-1.0 to 1.0+ releases: https://flower.ai/docs/framework/how-to-upgrade-to-flower-1.0.html
I was only testing with one client running. I have tested it with two clients and I now get the following error:
TypeError: ObjectDetectionClient.get_parameters() takes 1 positional argument but 2 were given
DEBUG flwr 2024-02-16 12:07:29,243 | connection.py:220 | gRPC channel closed
Traceback (most recent call last):
File "/home/paul/Research/flower_cv/client.py", line 31, in <module>
fl.client.start_client(server_address="[::]:8080", client=ObjectDetectionClient())
File "/home/paul/Research/flower_cv/venv/lib/python3.11/site-packages/flwr/client/app.py", line 248, in start_client
_start_client_internal(
File "/home/paul/Research/flower_cv/venv/lib/python3.11/site-packages/flwr/client/app.py", line 361, in _start_client_internal
message = receive()
^^^^^^^^^
File "/home/paul/Research/flower_cv/venv/lib/python3.11/site-packages/flwr/client/grpc_client/connection.py", line 132, in receive
proto = next(server_message_iterator)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/paul/Research/flower_cv/venv/lib/python3.11/site-packages/grpc/_channel.py", line 540, in __next__
return self._next()
^^^^^^^^^^^^
File "/home/paul/Research/flower_cv/venv/lib/python3.11/site-packages/grpc/_channel.py", line 966, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-02-16T12:07:29.037668125-06:00", grpc_status:14, grpc_message:"Socket closed"}"
I will try upgrading later, but I don't want to rewrite my entire implementation yet.
I'm using version 1.7.0 of flwr, but I'm still encountering this error. It works fine locally, but when the server hosted on an AWS EC2 cluster, I get this error on each client running on the same machine. I've opened ports 8080, 9091, 9092, and 9093 on EC2. Clients connect and train successfully, but this error occurs at the end of training.
Traceback (most recent call last):
File "client_cyclegan.py", line 131, in <module>
fl.client.start_client(server_address="<EC2 Public IP>:8080", client=FlwrClient(opt).to_client())
File "/home/ramindu/miniconda3/envs/FedCycleGAN/lib/python3.8/site-packages/flwr/client/app.py", line 248, in start_client
_start_client_internal(
File "/home/ramindu/miniconda3/envs/FedCycleGAN/lib/python3.8/site-packages/flwr/client/app.py", line 361, in _start_client_internal
message = receive()
File "/home/ramindu/miniconda3/envs/FedCycleGAN/lib/python3.8/site-packages/flwr/client/grpc_client/connection.py", line 132, in receive
proto = next(server_message_iterator)
File "/home/ramindu/miniconda3/envs/FedCycleGAN/lib/python3.8/site-packages/grpc/_channel.py", line 542, in __next__
return self._next()
File "/home/ramindu/miniconda3/envs/FedCycleGAN/lib/python3.8/site-packages/grpc/_channel.py", line 968, in _next
raise self
grpc._channel._MultiThreadedRendezvous: <_MultiThreadedRendezvous of RPC that terminated with:
status = StatusCode.UNAVAILABLE
details = "Socket closed"
debug_error_string = "UNKNOWN:Error received from peer {created_time:"2024-03-19T08:17:21.62228519+05:30", grpc_status:14, grpc_message:"Socket closed"}"
>
I believe to have had the same problem. For some reason, the server only creates an IPv6 socket.
For me the solution was to completely disable IPv6 system-wide on the server machine.
You can easily check if this is your problem by running something like netplan -tulpn
.
Describe the bug
Client gets a gRPC error when trying to connect on flwr version 0.17.0
Steps/Code to Reproduce
My code is available here: https://github.com/PaulKMandal/flower_cv/tree/main
Expected Results
The model should begin training
Actual Results
I get the following error: