RUCAIBox / RecBole-CDR

This is a library built upon RecBole for cross-domain recommendation algorithms
MIT License
82 stars 12 forks source link

Splitting target domain dataset to be consistent, while using different source domains #50

Closed ajaykv1 closed 1 year ago

ajaykv1 commented 1 year ago

I want the target domain dataset to be split in order without shuffling. So when I run the algorithm CoNet, for example using the different source domains but the same target domain, I want the train, valid, and test set for the target domain to be the same through the multiple runs. For example, let's say I have three datasets. I make dataset 1 the target domain, and dataset 2 and dataset 3 as the source domains. When I run CoNet on the domain pair of dataset 2 and dataset 1, I want the train, valid, and test set for dataset 1 to be the same as when I run CoNet on the domain pair of dataset 3 and dataset 1. How can I achieve this?

My current Yaml file is below. Is this the correct way to do this, or do I have to add anything else?

source_domain:

seed: 44
gpu_id: "0"
dataset: '../source/data'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
TIME_FIELD: timestamp
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating, timestamp]

embedding_size: 64
user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

target_domain:

seed: 44
gpu_id: "0"
dataset: '../target/data'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating
TIME_FIELD: timestamp

eval_args:
    group_by: user
    order: TO
    split: {'RS': [0.7,0.2,0.1]}
    mode: full

load_col:
    inter: [user_id, item_id, rating, timestamp]

embedding_size: 64
user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)
Wicknight commented 1 year ago

@ajaykv1 Hello! Here I recommend preprocessing the target domain dataset and passing it in with the parameter benchmark_filename. This allows you to use the same target domain dataset each time. For details about this parameter, you can refer to our official document.