Mellanox / SparkRDMA

This is archive of SparkRDMA project. The new repository with RDMA shuffle acceleration for Apache Spark is here: https://github.com/Nvidia/sparkucx
Apache License 2.0
240 stars 70 forks source link

Fail to re-produce the speed-up of TeraSort with SparkRDMA #28

Closed tobegit3hub closed 5 years ago

tobegit3hub commented 5 years ago

We have compare the performance of using Spark shuffle manager and pure TCP. The speed-up is 30% at most but not 2.63 times from the README.md.

Can the tester of this project provide more specified parameters to re-produce this performance? For example, how many partition should we use and should we restrict the executor memory or executor number?

petro-rudenko commented 5 years ago

Hi, the speedup depends on many factors (CPU number on machine, disks (HDD vs SSD vs NVME), Nic type (Connect X3 vs Connect X4 vs Connect X5 | Infiniband vs Roce). Some results published on the wiki page incl.configuration and hardware type.

tobegit3hub commented 5 years ago

Thanks @petro-rudenko and that is the detail we need.