Alluxio / alluxio

Alluxio, data orchestration for analytics and machine learning in the cloud
https://www.alluxio.io
Apache License 2.0
6.8k stars 2.93k forks source link

Move transferleader logic from master to client-side #14068

Closed maobaolong closed 2 years ago

maobaolong commented 3 years ago

Is your feature request related to a problem? Please describe.

Now, alluxio master like a remote proxy for alluxio elect shell from client side, which send transferleader rpc to ratis server asynchronously, so the elect shell will not receive any message although the transferleader rpc call timeout or encountered an exception

If we move the tranfer leadership code logic from master to the client-side, we can maintain the progress, status, result easily.

Describe the solution you'd like

image

Describe alternatives you've considered

let admin shell trigger transfer leader.

Urgency

Normal

Additional context

No

LuQQiu commented 3 years ago

According to @maobaolong, the benefits of this change includes:

  1. support job master leadership transition
  2. Client better know the transfer leadership related error/exceptions: The current status is the leading master send transfer leadership request to ratis server asynchronously, client doesn't know the exception and error during the leadership transfer, and can only know that the transfer timeout. When putting to the client-side, client side can finish the transfer leadership in its process and know whether the request error put or timeout, so that electCommand can give user the error message and exit earlier

@jenoudet @ggezer any ideas of whether putting into client-side is beneficial or not? what are your concerns about moving from master to client-side? @apc999 FYI

maobaolong commented 3 years ago

@LuQQiu Thanks for the conclusion and reply.

ggezer commented 3 years ago

https://alluxio-community.slack.com/archives/CEXGGUBDK/p1631818139229200?thread_ts=1631538692.178700&cid=CEXGGUBDK

@apc999 FYI

maobaolong commented 3 years ago

https://issues.apache.org/jira/browse/HDDS-5686

I'm glad to share the information, Apache Ozone is backporting alluxio transferleader feature to Ozone project. And there are two ha services OM/SCM just like master/job-master, so the ozone developer will not implement the same logic to two server-side, the client-side implementation approach will be applied.

ggezer commented 3 years ago

https://issues.apache.org/jira/browse/HDDS-5686

I'm glad to share the information, Apache Ozone is backporting alluxio transferleader feature to Ozone project. And there are two ha services OM/SCM just like master/job-master, so the ozone developer will not implement the same logic to two server-side, the client-side implementation approach will be applied.

We're handling this at the journal layer which is shared between Alluxio master and job-master. Ozone's requirements are not binding to us.

maobaolong commented 2 years ago

@ggezer @LuQQiu Thanks for the reply and discussion before, I close this issue as I will turn to another way to trigger ratis server.