RUCAIBox / RecBole-CDR

This is a library built upon RecBole for cross-domain recommendation algorithms
MIT License
82 stars 12 forks source link

Embedding Mismatch #41

Closed ajaykv1 closed 1 year ago

ajaykv1 commented 1 year ago

Hi, I am running the CoNet algorithm on two datasets. On some datasets, the algorithm is outputting results, and is working fine. But, on some other cases, I am getting this error:

File "/home/akrish/test-env/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1604, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for CoNet: size mismatch for source_user_embedding.weight: copying a param with shape torch.Size([7572, 64]) from checkpoint, the shape in current model is torch.Size([8649, 64]). size mismatch for target_user_embedding.weight: copying a param with shape torch.Size([7572, 64]) from checkpoint, the shape in current model is torch.Size([8649, 64]). size mismatch for source_item_embedding.weight: copying a param with shape torch.Size([6843, 64]) from checkpoint, the shape in current model is torch.Size([4222, 64]). size mismatch for target_item_embedding.weight: copying a param with shape torch.Size([6843, 64]) from checkpoint, the shape in current model is torch.Size([4222, 64]).

I am also getting this same error when running the DTCDR, CMF, and CLMF algorithms. I am also using a GPU, so I don't know if that may cause an issue.

My Yaml file looks like this, where the dataset points to the .inter files for each domain....

source_domain:

seed: 44
gpu_id: "0"
dataset: '/home/akrish/fall_2022/dataframes/action_data/action'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating]

user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

target_domain:

seed: 44
gpu_id: "0"
dataset: '/home/akrish/fall_2022/dataframes/adventure_data/adventure'
USER_ID_FIELD: user_id
ITEM_ID_FIELD: item_id
RATING_FIELD: rating

load_col:
    inter: [user_id, item_id, rating]

user_inter_num_interval: "[0,inf)"
item_inter_num_interval: "[0,inf)"
val_interval:
    rating: "[0,inf)"

I would appreciate any assistance on this issue.

Wicknight commented 1 year ago

@ajaykv1 Hello, thanks for your attention to RecBole-CDR! Could you please provide more detailed information? For example, the number of users and items in two datasets. We did not get similar errors when running on the GPU. In addition, please check whether the dataset used in your pretraining is consistent with the dataset used in loading checkpoints.

ajaykv1 commented 1 year ago

Hi I realized that I was specifying a file path that was wrong. I was able to resolve the issue.