[🐛BUG] 采用 benchmark_filename 之后不会对 train, valid, test 数据集中的数据进行 shuffle 吗?

RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library

https://recbole.io/

MIT License

3.34k stars 604 forks source link

Open iridescentttt opened 3 weeks ago

iridescentttt commented 3 weeks ago

这里 shuffle 操作只在不使用 benchmark_filename_list 以及 ordering_args == "RO" 时候才会进行. 所以使用 benchmark_filename_list 自定义 split 后, 由于没有 shuffle 导致了性能下降.

zhengbw0324 commented 2 weeks ago

@iridescentttt 您好！使用benchmark_filename后，我们不会在dataset中对数据进行shuffle，防止破坏数据划分界限。但在训练中，使用的是Pytorch的dataloader，会将训练数据进行shuffle。 https://github.com/RUCAIBox/RecBole/blob/2b6e209372a1a666fe7207e6c2a96c7c3d49b427/recbole/data/utils.py#L174-L176