DeepRec-AI / HybridBackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
Apache License 2.0
156 stars 30 forks source link

[DATA] Introduce new RebatchDataset to replace rebatch and rectify #110

Closed 2sin18 closed 1 year ago

2sin18 commented 1 year ago

This patch fixes indices alignment issues of rebatch and rectify, closes #97 #107

Rebatching Benchmark Results:

Dataset unbatch+batch rebatch speedup
taobao 8080.07 samples/sec 126529.92 samples/sec 15.66x
criteo 23376.31 samples/sec 2827035.27 samples/sec 120.94x

Row-wise Shuffling Benchmark Results (Preparing time in unbatch+shuffle+batch not included):

Dataset unbatch+shuffle+batch shuffle_rebatch speedup
taobao 5928.07 samples/sec 25841.40 samples/sec 4.36x
criteo 11643.74 samples/sec 32840.11 samples/sec 2.82x
github-actions[bot] commented 1 year ago

Test Results

  75 files    75 suites   5m 25s :stopwatch: 108 tests 108 :heavy_check_mark:   0 :zzz: 0 :x: 324 runs  234 :heavy_check_mark: 90 :zzz: 0 :x:

Results for commit 3a41fbe5.