datawhalechina / torch-rechub

A Lighting Pytorch Framework for Recommendation Models, Easy-to-use and Easy-to-extend.
MIT License
392 stars 73 forks source link

add ali-ccp dataset,change embedding default initializer and fix bugs #16

Closed yinpu closed 2 years ago

yinpu commented 2 years ago
  1. add ali-ccp dataset. add preprocess data and two examples for multi-tasks and ctr-ranking.
  2. change embedding default initializer. Default to RandomNormal(0, 0.0001) now. This refers to the initialization method of deepctr. And tested on ml-dssm and ml-facebook-dssm on match, hit@100 increased from 14% to 20%, and from 2.x% to 10.x%, respectively.
  3. fix census bug. add fillna(0).
  4. change the shuffle of the dataloader. train_dataloader shuffle should be setting to True. test_dataloader and val_dataloader shuffle should be setting to False