[FEATURE][Spark] Support partial sort and combine for reducing shuffle data size

apache / incubator-uniffle

Uniffle is a high performance, general purpose Remote Shuffle Service.

https://uniffle.apache.org/

Apache License 2.0

370 stars 143 forks source link

[FEATURE][Spark] Support partial sort and combine for reducing shuffle data size #446

Open zuston opened 1 year ago

zuston commented 1 year ago

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Search before asking

[X] I have searched in the issues and found no similar issues.

Describe the feature

In spark client, currently uniffle don't support merge for some ops supporting combine to reduce data size. Due to this, in some cases, it will cause unnecessary network and performance regression.

Motivation

No response

Describe the solution

No response

Additional context

No response

Are you willing to submit PR?

[ ] Yes I am willing to submit a PR!

zuston commented 1 year ago

I encounter this problem in our online jobs, but I don't dig into it. Anyway, from my side, this feature is an improvement.

cc @xianjingfeng @advancedxy @jerqi

advancedxy commented 1 year ago

You mean map side combine?

Well, last time I checked, uniffle don't do any combine, just create a combined value for each record.

If we are going to support this feature, maybe a lot of client side code should be rewrote...😇

zuston commented 1 year ago

You mean map side combine?

Yes

If we are going to support this feature, maybe a lot of client side code should be rewrote...😇

Yes. 🙆

KnightChess commented 7 months ago

we also have this demand. I met in hudi table which will use countByKey to evaluation work load.

KnightChess commented 7 months ago

any plan to support it?

zuston commented 7 months ago

any plan to support it?

haven't. If you want this, feel free to discuss more

KnightChess commented 7 months ago

how about use ExternalAppendOnlyMap to archive it, but it's may spill to disk, we can add a param to controll it. And I think if map side can be combine, it will reduce shuffle data and reduce data skew problems

zhengchenyu commented 7 months ago

ExternalAppendOnlyMap is used to handle huge data, I think it is not necessary. We can sort and combine in memory, then serialize the records. I did it in #1239 (developing) by this way. But I bind only the use of the remote merge feature.

I afraid that the proportion of being combined may be not high, since we use pipelined shuffle in rss, only partial record will be combined.

zuston commented 7 months ago

The client and server side merge are all needed.

KnightChess commented 7 months ago

I think the merging should work on the client side, not on the RSS server. It maximizes the utilization of executor resources, rather than placing the computation burden on the RSS server, whose resources are shared and need to ensure stability and the normal operation of core services.

KnightChess commented 7 months ago

And furthermore, I think the RSS server is best left to handle data transmission efficiently without any content processing. It should return whatever it receives from user tasks. All data content processing is preferably performed by the executors, allowing the RSS server to focus solely on efficient shuffle transmission. What do you think?

zhengchenyu commented 7 months ago

@KnightChess

If you think RSS sever should not handle any content, you can save the shuffle date to 3rd remote filesystem, and read the shuffle data from this filesystem.

What is your purpose of using remote shuffle service?

For me, the main purpose for remote merge in shuffle side is that application will not use any local file as shuffle storage, then we can realize the separation of storage and computing, then support hybrid deployment.

Then Remote merge is optional function. And If we merge on server side, we can reduce the resource usage of executor, then add the resource to shuffle server. That's not a problem.

zuston commented 7 months ago

I think the merging should work on the client side, not on the RSS server.

I think the zheng's design motivation is not same with you @KnightChess , if you want this, client merge design is OK for me, that is not conflict with remote merge. Right @zhengchenyu

zhengchenyu commented 7 months ago

Well, yes. I'm just offering a suggestion, sort in memory. There is no conflict!