RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.44k stars 615 forks source link

benchmarking performance comparisons #1627

Open liyangliu opened 1 year ago

liyangliu commented 1 year ago

Thank you for your great work. Do you have plan to provide benchmarking performance comparisons under common datasets (ref. FuxiCTR), such as the performance of DIN on alimama dataset. Given the comparisons, I think it's more helpful for academic and industrial research.

By the way, how do you convert the raw alimama dataset into atomic files, since I didn't find the script for AliEC dataset. After downloading the converted dataset from Google cloud, I noticed that session features (i.e. historical clicked items) are absent, then the current version of AliEC converted data is not compatible with DIN's attention unit.

image
Paitesanshi commented 1 year ago

Hello @liyangliu , thank you for your attention and suggestions on RecBole. We may consider providing benchmark performance in the future. You can use the conversion tool we provide to convert the raw dataset into atomic files. You can find it here.

liyangliu commented 1 year ago

@Paitesanshi Thank you for response. I didn't find how you process AliEC dataset in https://github.com/RUCAIBox/RecSysDatasets/blob/master/conversion_tools/src/extended_dataset.py, is it in other file?

Sherry-XLL commented 1 year ago

@liyangliu Hello, we didn't make the the script for processing the AliEC dataset public for the time being. While in the pre-processing stage, we kept the interaction between users and items in the original dataset with click=1 to form an .inter file, so the item_id:token is the historical clicked items of corresponding user_id:token.

Thanks for your attention to RecBole!