FedML-AI / FedML

FEDML - The unified and scalable ML library for large-scale distributed training, model serving, and federated learning. FEDML Launch, a cross-cloud scheduler, further enables running any AI jobs on any GPU cloud or on-premise cluster. Built on this library, TensorOpera AI (https://TensorOpera.ai) is your generative AI platform at scale.
https://TensorOpera.ai
Apache License 2.0
4.19k stars 786 forks source link

[CoreEngine] set the cuda visible id into the docker container when t… #2226

Closed fedml-alex closed 2 months ago

fedml-alex commented 2 months ago

…raining.