apache / celeborn

Apache Celeborn is an elastic and high-performance service for shuffle and spilled data.
https://celeborn.apache.org/
Apache License 2.0
862 stars 351 forks source link

[CELEBORN-1490][CIP-6] Extends message to support hybrid shuffle #2714

Closed reswqa closed 2 weeks ago

reswqa commented 2 weeks ago

What changes were proposed in this pull request?

This is the first PR to support Hybrid Shuffle.

Extends message to support hybrid shuffle.

Why are the changes needed?

hybrid shuffle is a tiered storage architecture, which introduces the concept of segment. One segment's data selects a tier to send. Data is split into segments and sent to multiple tiers.

This PR introduces segment-related message. In addition, hybrid shuffle needs to distinguish which subpartition it comes from when consuming data, so we need to extend the SubpartitionId field to ReadData (new class introduced for compatibility).

Does this PR introduce any user-facing change?

no.

How was this patch tested?

no need.