FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.11k stars 772 forks source link

fix bug in set_device_map for trpc #2113

Open bene-ges opened 2 months ago

bene-ges commented 2 months ago

device_list is a dictionary and we need to map to its values, not keys. Issue: https://github.com/FedML-AI/FedML/issues/2002

bene-ges commented 3 weeks ago

@chaoyanghe would you merge this PR? It's a simple bug fix