HKUDS / XRec

[EMNLP'2024] "XRec: Large Language Models for Explainable Recommendation"
http://arxiv.org/abs/2406.02377
Apache License 2.0
87 stars 6 forks source link

关于 4.4 Model Robustness against Data Sparsity #6

Open sugarandgugu opened 4 days ago

sugarandgugu commented 4 days ago

恭喜你们的工作被EMNLP接受,想请问你们4.4节中这个数据集是怎么划分的,有具体代码或者数据集吗?

Martin-qyma commented 4 days ago

Thank you for your interest in XRec! The dataset in Section 4.4 is organized based on the frequency of user appearances in the training data. Here are detailed steps to separate the data:

Notice that users who appear only in the test or validation datasets and not during the training process are considered zero-shot users. We hope this helps clarify your concerns.

sugarandgugu commented 1 day ago

感谢您的回复,想请问如果使用PEPLER所提供的TripAdvisor数据集,应该按照什么步骤处理成你们论文所用的数据格式呢?

Martin-qyma commented 1 day ago

TripAdvisor lacks item descriptions, which sets it apart from our datasets. However, you can create descriptions yourself using a similar process to how we construct user descriptions. The approach involves feeding a LLM (e.g. gpt-3.5-turbo) with selected reviews that the item has received. The LLM then summarizes these interactions to determine the nature of the item and generates a concise sentence description. This method is equally applicable to generating user descriptions. We hope this would help.