Open liyangliu opened 1 year ago
Hello @liyangliu , thank you for your attention and suggestions on RecBole. We may consider providing benchmark performance in the future. You can use the conversion tool we provide to convert the raw dataset into atomic files. You can find it here.
@Paitesanshi Thank you for response. I didn't find how you process AliEC dataset in https://github.com/RUCAIBox/RecSysDatasets/blob/master/conversion_tools/src/extended_dataset.py, is it in other file?
@liyangliu Hello, we didn't make the the script for processing the AliEC dataset public for the time being. While in the pre-processing stage, we kept the interaction between users and items in the original dataset with click=1
to form an .inter
file, so the item_id:token
is the historical clicked items of corresponding user_id:token
.
Thanks for your attention to RecBole!
Thank you for your great work. Do you have plan to provide benchmarking performance comparisons under common datasets (ref. FuxiCTR), such as the performance of DIN on alimama dataset. Given the comparisons, I think it's more helpful for academic and industrial research.
By the way, how do you convert the raw alimama dataset into atomic files, since I didn't find the script for AliEC dataset. After downloading the converted dataset from Google cloud, I noticed that session features (i.e. historical clicked items) are absent, then the current version of AliEC converted data is not compatible with DIN's attention unit.