Thank you very much for your attention to our work. We have two distinctions in our implementation compared to the traditional SASRec approach to make it suitable for the rating prediction task, which we also mentioned in the experimental section of our paper:
(1) We have incorporated user feedback from their interaction history into the item embedding. More specifically, we have appended the feedback given by the user for each item in their interaction history (0 or 1) to the item embedding itself. This results in a new embedding that is then utilized in SASRec.
(2) In the final layer of our network, we have opted to utilize the Sigmoid activation function rather than Softmax. We have made this choice because we are essentially performing binary classification for each item, categorizing them as either 0 or 1. Therefore, it is natural for us to employ the commonly used Sigmoid activation function in the CTR task. For instance, when we possess a user's interaction sequence and wish to predict whether the user will click on the i-th item, we first obtain the logit for the i-th item using SASRec. Subsequently, we pass the logit through the Sigmoid function, yielding the probability of the user clicking on the i-th item. During the training phase, we optimize the Mean Squared Error (MSE) loss to ensure that this probability aligns as closely as possible with the corresponding label (0 or 1).
Thank you very much for your attention to our work. We have two distinctions in our implementation compared to the traditional SASRec approach to make it suitable for the rating prediction task, which we also mentioned in the experimental section of our paper: (1) We have incorporated user feedback from their interaction history into the item embedding. More specifically, we have appended the feedback given by the user for each item in their interaction history (0 or 1) to the item embedding itself. This results in a new embedding that is then utilized in SASRec. (2) In the final layer of our network, we have opted to utilize the Sigmoid activation function rather than Softmax. We have made this choice because we are essentially performing binary classification for each item, categorizing them as either 0 or 1. Therefore, it is natural for us to employ the commonly used Sigmoid activation function in the CTR task. For instance, when we possess a user's interaction sequence and wish to predict whether the user will click on the i-th item, we first obtain the logit for the i-th item using SASRec. Subsequently, we pass the logit through the Sigmoid function, yielding the probability of the user clicking on the i-th item. During the training phase, we optimize the Mean Squared Error (MSE) loss to ensure that this probability aligns as closely as possible with the corresponding label (0 or 1).