Oneflow-Inc / oneflow

OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
http://www.oneflow.org
Apache License 2.0
5.79k stars 658 forks source link

[Question]: Does oneflow have any plan to integrate with k8s? #10309

Open yiliu30 opened 11 months ago

yiliu30 commented 11 months ago

Description

I am curious to know if there are any ongoing or future efforts to make OneFlow compatible with Kubernetes for better orchestration and management of machine learning workloads. If such integration is already underway, could you provide some details on the progress? thx

Alternatives

No response

jackalcooper commented 11 months ago

We don't have a concrete roadmap for that but any suggestions are welcomed. Do you have any particular features in mind you would like to see in OneFlow regarding Kubernetes?

yuanms2 commented 11 months ago

The usage of oneflow with k8s should completely be the same as pytorch with k8s.

you can use oneflow with pytorch-operator https://github.com/kubeflow/pytorch-operator

yiliu30 commented 11 months ago

We don't have a concrete roadmap for that but any suggestions are welcomed. Do you have any particular features in mind you would like to see in OneFlow regarding Kubernetes?

Thank you for your reply. Is it possible to port the supported distributed strategies to Kubernetes? For example, can we perform model parallelism across multiple pods?

yiliu30 commented 11 months ago

The usage of oneflow with k8s should completely be the same as pytorch with k8s.

you can use oneflow with pytorch-operator https://github.com/kubeflow/pytorch-operator

Thanks for your guidance, this repo has been archived and the functionalities have been merged into the training-operator. It seems that these APIs are designed specifically for frameworks such as PyTorch and PaddlePaddle...