JonathanChiang / eDash

Open Source Clinical Interface for Medical Imaging
MIT License
2 stars 0 forks source link

Leverage Ray from Berkeley for Distributed Training #5

Open JonathanChiang opened 5 years ago

JonathanChiang commented 5 years ago

DEMOCRATIZING PRODUCTION-SCALE DISTRIBUTED DEEP LEARNING

https://arxiv.org/pdf/1811.00143.pdf

To address the above challenges, we discuss a system webuilt at Apple known asAlchemist. Alchemist adopts acloud-native architecture and is portable among private andpublic clouds. It supports multiple training frameworkslike Tensorflow or PyTorch and multiple distributed trainingparadigms. The compute cluster is managed by, but not lim-ited to, Kubernetes2. We chose a containerized workflowto ensure uniformity and repeatability of the software envi-ronment. In the following sections, we refer to engineers,researchers, and data scientists using Alchemist asusers.

JonathanChiang commented 5 years ago

https://github.com/ray-project/ray/issues/1945