关于 4.4 Model Robustness against Data Sparsity

HKUDS / XRec

[EMNLP'2024] "XRec: Large Language Models for Explainable Recommendation"

http://arxiv.org/abs/2406.02377

Apache License 2.0

102 stars 9 forks source link

关于 4.4 Model Robustness against Data Sparsity #6

Open sugarandgugu opened 2 months ago

sugarandgugu commented 2 months ago

恭喜你们的工作被EMNLP接受，想请问你们4.4节中这个数据集是怎么划分的，有具体代码或者数据集吗？

Martin-qyma commented 2 months ago

Thank you for your interest in XRec! The dataset in Section 4.4 is organized based on the frequency of user appearances in the training data. Here are detailed steps to separate the data:

Ranking: Users are first ranked according to the number of their appearances in the training dataset.
Grouping: Users are then divided into five equal groups, with each group containing the same number of users, each labeled tst1 through tst5.

Notice that users who appear only in the test or validation datasets and not during the training process are considered zero-shot users. We hope this helps clarify your concerns.

sugarandgugu commented 2 months ago

感谢您的回复，想请问如果使用PEPLER所提供的TripAdvisor数据集，应该按照什么步骤处理成你们论文所用的数据格式呢？

Martin-qyma commented 2 months ago

TripAdvisor lacks item descriptions, which sets it apart from our datasets. However, you can create descriptions yourself using a similar process to how we construct user descriptions. The approach involves feeding a LLM (e.g. gpt-3.5-turbo) with selected reviews that the item has received. The LLM then summarizes these interactions to determine the nature of the item and generates a concise sentence description. This method is equally applicable to generating user descriptions. We hope this would help.