DeepRec-AI / HybridBackend

A high-performance framework for training wide-and-deep recommender systems on heterogeneous cluster
Apache License 2.0
156 stars 30 forks source link

[DATA] Refine data deduplication and add example. #135

Closed francktcheng closed 1 year ago

francktcheng commented 1 year ago
  1. Refine to fix cornercases in data deduplication.
  2. Add a script to prepare deduplication data files.
  3. Test data deduplication on Dcnv2 model with taobao dataset.

End-to-end performance on a single A100 GPU with one day of Taobao dataset and a batch size of 64000 obtains an improvement of training throughput around 1.3x.