RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.41k stars 612 forks source link

How to handle weighted and repeating interactions? #1892

Closed sdinanozer closed 1 year ago

sdinanozer commented 1 year ago

First of all, thanks for easy to use library with such a wide selection of models. I'm using general recommendation models, such as BPR and LightGCN. I want to use user_id, item_id, item_price, and timestamp values for my .inter file. Right now I'm not using any other item or user attributes. I have two questions:

  1. Is there any way to give weight to interactions in the .inter file? Like assigning weights to links in a bipartite graph. I saw RATING_FIELD in docs but it implies an explicit feedback, whereas I want to use price of items as weights. The users can't control the price and higher price shouldn't be treated as a higher rating, so I can't use this field for price. I thought of adding price to item attributes but the same item appears with different price values in different transactions (due to change in time or region), therefore the price is attached to the interaction rather than the item.
  2. How do the models treat repeated interactions and time-wise duplicate rows? If a user-item pair appears more than once with different timestamps, I assume this won't cause any issues and models can work out the temporal relations, although I'm not sure. When it comes to time-wise duplicates, what I mean is different rows with exact same user_id, item_id, timestamp values. Some of the users in data are middlemen represented as a single user, so those duplicate rows are valid data and I want the models to use them. Do the models discard these duplicates? If they don't, how do they treat that information? (Assuming rm_dup_inter is None because I want to use these rows)

Thanks in advance!

BoXiaohe commented 1 year ago

Thanks for your attention to RecBole! For the first question, you can define which column of data to be loaded by modifying the LABEL_FIELD (str) parameter, then it's free for you to determine how to use the data loaded. For the second question, if you set rm_dup_inter (str)=None, then the model will not discard the duplicate ones, you can process them yourself. You can refer to https://recbole.io/docs/user_guide/config/data_settings.html for more details of the parameters. Hope this could help you!

sdinanozer commented 1 year ago

I see, thanks for the answer. Just to clarify on the first question, the docs say "your label column should only be 0 or 1" and setting a threshold seem to end up in binary data too. So I guess what I'm supposed to do is put price information to LABEL_FIELD and use negative sampling. Am I correct?