apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.
https://uniffle.apache.org/
Apache License 2.0
371 stars 143 forks source link

[Umbrella] Better K8S operator support #462

Open advancedxy opened 1 year ago

advancedxy commented 1 year ago

Code of Conduct

Search before asking

Describe the proposal

To support deployment on K8S natively and smoothly, we may have to add the following support:

  1. expose more fields in operator's CRD, such as RuntimeClassName, Tolerations, Annotation and Affinity, etc. Therefore the shuffle server cloud be deployed more flexible
  2. LogHostPath and HostPathMounts may be refactored to be supplied by container runtime. As shuffle server may be deployed on mixed nodes, the HostPathMounts can be different on different hosts.
  3. Add an cli binary to hide details of RSS operations: rolling upgrade, restart, fully upgrade and gray version etc.
  4. vpc template support
  5. service and network refinement:
    • shuffle server is a network traffic heavy application, it's not wise to use service to proxy external client's read/write request to shuffle server
    • coordinators' deployment may need some refine, in current arch, the replicate of coordinator can only one 1. Otherwise, there would be a brain split problem.
  6. various bug fixes, such as init-containers resource request/limit.

    Task list

Are you willing to submit PR?

advancedxy commented 1 year ago

cc @wangao1236