Closed Snowilv closed 10 months ago
Hi 👋
Thanks for you interests on RLMRec! The operation of splitting into training/validation/testing sets is performed on an interaction basis.
After completing filtering and k-core operations, let's assume we have 100,000
remaining user-item interaction records. If we require a ratio of 3:1:1
for training/validation/testing, we randomly sample 60,000
interactions to construct the training set (ensuring that each user and item have at least one interaction; otherwise, we resample). The remaining 40,000
interactions are randomly divided in half to create the validation and testing sets.
I hope the provided answers can address your questions :)
Best regards, Xubin
Thx! It works for me.
Sorry to bother again. The way to generate profile confuses me now.
So do we need to send the system prompt every time before we obtain a user/item information, or we do it every few times? Considering system prompt is really token consuming.
Hi 👋
Yeah, when generating profiles, we need to send the system prompt each time. This is because, when calling the API, each query is treated as a new conversation, requiring us to resend the system prompt. The system prompt will take around 400 tokens.
However, with the improved language model's understanding capabilities, I believe we can now try reducing unnecessary descriptions in the system prompt and see if the output still meets the desired requirements :)
Best regards, Xubin
Get it! Thanks for your thorough response. Happy New Year by the way!
Hi, thanks for this impressive work! How can I split the Amazon review dataset into training/validation/test sets? Sadly I didn't see related codes in the repo, is there any guideline?