FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.19k stars 786 forks source link

A problem running the example stuck after "using_mlops=true" #1175

Open yaokunxu opened 1 year ago

yaokunxu commented 1 year ago

Environment: OS : Windows 10 Pro 22H2 19045.3208 Pytorch : 2.0.1 cu118 FedML : 0.8.6 Description: After using pip to install the fedml , i run the code and get the following output. Stuck with the "Network Connection Checking" phase. fedml

Any advice from you will be appreciated.

yaokunxu commented 1 year ago

issue solved . ImportError: DLL load failed while importing MPI:找不到指定的模块。 download the msmpisetup.exe from https://www.microsoft.com/en-us/download/confirmation.aspx?id=57467

yaokunxu commented 1 year ago

and change the code line "fedml.run_simulation()" into "fedml.run_simulation(backend="MPI")"

xuedingebuaimao commented 1 year ago

牛,照你说的跑起来了,但是最后会出一行warning [FedML-Server @device-id-0] [Mon, 31 Jul 2023 12:27:18] [WARNING] [com_manager.py:133:_notify_connection_ready] Cannot handle connection ready 然后在这儿卡住了

yaokunxu commented 1 year ago

An employee told me that it is normal to get such promotes , but i strongly recommend you to ask them yourself