apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.
https://uniffle.apache.org/
Apache License 2.0
387 stars 149 forks source link

[FEATURE] Support hybrid shuffle manager #1020

Open zuston opened 1 year ago

zuston commented 1 year ago

Code of Conduct

Search before asking

Describe the feature

Delegation shuffle manager has been supported in uniffle, which is useful for avoiding instability of uniffle cluster.

But this is not enough, we hope the hybrid shuffle manager could be supported, that will support using ESS or Uniffle on different stages in one App

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

zuston commented 1 year ago

cc @jerqi @advancedxy

jerqi commented 1 year ago

The biggest problem is exchange reuse and stage recompute. How to handle them?

zuston commented 1 year ago

The biggest problem is exchange reuse and stage recompute. How to handle them?

Sorry, I don't get your point. BTW, I tested the exchange reuse and it works well when using HybridShuffleManager.

From my prospective, for one shuffleId, it only will use one type of ess/rss, so it always will OK. Right? If I'm wrong, please point out. Thanks

jerqi commented 1 year ago

The biggest problem is exchange reuse and stage recompute. How to handle them?

Sorry, I don't get your point. BTW, I tested the exchange reuse and it works well when using HybridShuffleManager.

From my prospective, for one shuffleId, it only will use one type of ess/rss, so it always will OK. Right? If I'm wrong, please point out. Thanks

If one shuffle is bind to ess or rss, it's ok. I'm worried that one shuffle change from rss to ess.

advancedxy commented 1 year ago

cc @jerqi @advancedxy

I think it's nice to have this feature. It might also be related with stage resubmission.

pegasas commented 1 year ago

Hi, community,

I would like to try this issue. Would you like to assign this task to me?

It seems I should add HybridShuffleManager with some tests and integration tests like DelegationRssShuffleManager/RssShuffleManager, which is an opportunity for me to go deeper into uniffle.

pegasas commented 1 year ago

Hi, @zuston , @jerqi ,

I have a naive question. How to decide on the shuffle strategy (ESS/RSS/Uniffle) of each shuffleId for spark driver/executor? When to change RSS to ESS is our best practise?

zuston commented 1 year ago

Hi, @zuston , @jerqi ,

I have a naive question. How to decide on the shuffle strategy (ESS/RSS/Uniffle) of each shuffleId for spark driver/executor? When to change RSS to ESS is our best practise?

You can see the DelegationShuffleManager design, the pass or not depends on the load or other access checker of coordinator determines