About data split - Githubissues

HKUDS / RLMRec

[WWW'2024] "RLMRec: Representation Learning with Large Language Models for Recommendation"

https://arxiv.org/abs/2310.15950

Apache License 2.0

321 stars 36 forks source link

About data split #4

Closed Snowilv closed 10 months ago

Snowilv commented 10 months ago

Hi, thanks for this impressive work! How can I split the Amazon review dataset into training/validation/test sets? Sadly I didn't see related codes in the repo, is there any guideline?

Re-bin commented 10 months ago

Hi 👋

Thanks for you interests on RLMRec! The operation of splitting into training/validation/testing sets is performed on an interaction basis.

After completing filtering and k-core operations, let's assume we have 100,000 remaining user-item interaction records. If we require a ratio of 3:1:1 for training/validation/testing, we randomly sample 60,000 interactions to construct the training set (ensuring that each user and item have at least one interaction; otherwise, we resample). The remaining 40,000 interactions are randomly divided in half to create the validation and testing sets.

I hope the provided answers can address your questions :)

Best regards, Xubin

Snowilv commented 10 months ago

Thx! It works for me.

Snowilv commented 10 months ago

Sorry to bother again. The way to generate profile confuses me now.

So do we need to send the system prompt every time before we obtain a user/item information, or we do it every few times? Considering system prompt is really token consuming.

Re-bin commented 10 months ago

Hi 👋

Yeah, when generating profiles, we need to send the system prompt each time. This is because, when calling the API, each query is treated as a new conversation, requiring us to resend the system prompt. The system prompt will take around 400 tokens.

However, with the improved language model's understanding capabilities, I believe we can now try reducing unnecessary descriptions in the system prompt and see if the output still meets the desired requirements :)

Best regards, Xubin

Snowilv commented 10 months ago

Get it! Thanks for your thorough response. Happy New Year by the way!