RUCAIBox / RecBole-GNN

Efficient and extensible GNNs enhanced recommender library based on RecBole.
MIT License
167 stars 37 forks source link

[💡SUG] Multi behavior dataset usage #42

Closed Tokkiu closed 2 years ago

Tokkiu commented 2 years ago

Hi, thanks for your hard work! The repo is quite useful and inspiring. As there're many multi-behavior recommendation models based on GNN, I wonder is there any plan to support multi-behavior dataset loading in this repo and how to use it? Thanks for your quick reply.

hyp1231 commented 2 years ago

Hi, thanks for your attention! Could you please give some examples of typical models and datasets? Maybe I'm not so familiar with research in this field. Thanks!

Tokkiu commented 2 years ago

Hi, thanks for your attention! Could you please give some examples of typical models and datasets? Maybe I'm not so familiar with research in this field. Thanks!

Thanks for your quick reply! The existing models using multi-behavior datasets and GNN include: TGT (https://arxiv.org/pdf/2206.02687.pdf) GNMR (https://arxiv.org/abs/2201.02307).

The datasets detail are listed in paper, e.g. Taobao Dataset.

hyp1231 commented 2 years ago

Thanks! These works are really interesting!

I believe that these models can be implemented by creating a new Dataset class based on existing SessionGraphDataset class. I'll consider to implement these models if I have some spare time : )

Tokkiu commented 2 years ago

Thanks! These works are really interesting!

I believe that these models can be implemented by creating a new Dataset class based on existing SessionGraphDataset class. I'll consider to implement these models if I have some spare time : )

Thank you for your kind reply! Also, I am pleased to contribute if you don't have time. We can discuss later for further details. Looking forward to new features of Recbole-GNN.

hyp1231 commented 2 years ago

Thanks for the contributions! Please comment for anything I can help. RecBole-GNN is still under actively development and new PRs will be soonly reviewed and merged.

Tokkiu commented 2 years ago

Sure, I am so happy to hear your comments! And I have some questions about the implementation of the multi-behavior dataset:

  1. How to store multi-behavior data? e.g., in .inter file? in another file? If we have 3 behaviors such as click/add/buy, then shall we have 3 files named as xx.inter, xx.add, xx.buy?
  2. How to load multi-behavior to interaction object? The length of different behaviors will be different.
  3. How to define and provide multi-behavior sequence when forwarding? We can't assume how many behaviors and multi types of multi-behavior.
hyp1231 commented 2 years ago

A1: I would suggest to store all these 3 behaviors in .inter and have a column named like behavior:token to denote the behavior types. Besides, we need to specify that

load_col:
  inter:
    user_id, item_id, timestamp, behavior

A2 & A3: Once we store the multi-behavior interaction sequences in .inter, then for models that inherit SequentialRecommender, the interaction sequences with the feature named behavior will be loaded automatically into Interaction, and we can use APIs like torch.where() to extract sub-sequences with a specific behavior.

Tokkiu commented 2 years ago

@hyp1231 I create a simple pr for this feature at #43. As you can see, I try to provide multi behaviors sequence to specify the behavior type and an overall graph matrix as before. I wonder is it general to use in this framework and is there any that could be improved. Thank you for your help.

hyp1231 commented 2 years ago

Thanks so much! LGTM!

I also left some comments about the field name and typos. Please feel free to update some or not.