FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.2k stars 787 forks source link

MLOps Beehive stuck in server_runner.py #636

Open WeijieTanggg opened 2 years ago

WeijieTanggg commented 2 years ago

I'm trying to run the fedml_demo on my M2 Macbook and one Huawei Pad. I've already built the server package successfully and upload it on the platform. But the process always stopped at running the server.

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:29] [INFO] [server_runner.py:743:setup_client_mqtt_mgr] client agent config: mqtt.fedml.ai,1883

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:29] [INFO] [server_runner.py:486:send_training_request_to_edges] Edge ids: [11823]

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:29] [INFO] [server_runner.py:489:send_training_request_to_edges] start_train: send topic flserver_agent/11823/start_train to client...

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:29] [INFO] [mlops_metrics.py:141:report_server_training_status] report_server_training_status. msg = {'run_id': 3400, 'edge_id': 11848, 'status': 'STARTING', 'role': 'normal'}

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:30] [INFO] [server_runner.py:283:build_dynamic_args] Bootstrap scripts are being executed...

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:30] [INFO] [server_runner.py:294:build_dynamic_args] [FedML]Bootstrap Finished

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:30] [INFO]

[FedML-Server(0) @device-id-11848] [Sun, 06 Nov 2022 09:29:30] [INFO] [server_runner.py:301:build_dynamic_args]

fedml-dimitris commented 1 year ago

@WeijieTanggg Can you try reproducing this using the latest FedML version?