RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.34k stars 604 forks source link

[🐛BUG] [field_name] == FeatureType.TOKEN: KeyError: 'class' #693

Closed mayaKaplansky closed 3 years ago

mayaKaplansky commented 3 years ago

Hi I have been trying to run the GRU4recF model, but encounter an error. Let me know what additional info I can give to understand the issue and resolve it. Thanks!


(base) C:\Users\Administrator\RecBole>python run_recbole.py --model=GRU4RecF --dataset=OfekMetaData --config_files=OfekMetaData.yaml
20 Jan 20:30    INFO General Hyper Parameters:
gpu_id=0
use_gpu=True
seed=2020
state=INFO
reproducibility=True
data_path=dataset/OfekMetaData\OfekMetaData

Training Hyper Parameters:
checkpoint_dir=saved
epochs=300
train_batch_size=2048
learner=adam
learning_rate=0.001
training_neg_sample_num=1
eval_step=1
stopping_step=10

Evaluation Hyper Parameters:
eval_setting=TO_LS,full
group_by_user=True
split_ratio=[0.8, 0.1, 0.1]
leave_one_num=2
real_time_process=True
metrics=['Recall', 'MRR', 'NDCG', 'Hit', 'Precision']
topk=[5]
valid_metric=MRR@5
eval_batch_size=4096

Dataset Hyper Parameters:
field_separator=
seq_separator=
USER_ID_FIELD=session_id
ITEM_ID_FIELD=item_id
RATING_FIELD=rating
LABEL_FIELD=label
threshold=None
NEG_PREFIX=neg_
load_col={'inter': ['session_id', 'item_id', 'timestamp'], 'user': ['session_id', 'PatientLocationID', 'GenderID', 'AgeGroup', 'JobGroup']}
unload_col=None
additional_feat_suffix=None
max_user_inter_num=None
min_user_inter_num=0
max_item_inter_num=None
min_item_inter_num=0
lowest_val=None
highest_val=None
equal_val=None
not_equal_val=None
drop_filter_field=True
fields_in_same_space=None
fill_nan=True
preload_weight=None
drop_preload_weight=True
normalize_field=None
normalize_all=True
ITEM_LIST_LENGTH_FIELD=item_length
LIST_SUFFIX=_list
MAX_ITEM_LIST_LENGTH=50
POSITION_FIELD=position_id
HEAD_ENTITY_ID_FIELD=head_id
TAIL_ENTITY_ID_FIELD=tail_id
RELATION_ID_FIELD=relation_id
ENTITY_ID_FIELD=entity_id

20 Jan 20:31    INFO OfekMetaData
The number of users: 2145929
Average actions of users: 3.7118747693305645
The number of items: 46
Average actions of items: 177009.24444444446
The number of inters: 7965416
The sparsity of the dataset: 91.93071078347398%
Remain Fields: ['item_id', 'timestamp', 'session_id', 'JobGroup', 'PatientLocationID', 'GenderID', 'AgeGroup']
20 Jan 20:31    INFO Build [ModelType.SEQUENTIAL] DataLoader for [train] with format [InputType.POINTWISE]
20 Jan 20:31    INFO Evaluation Setting:
        Group by session_id
        Ordering: {'strategy': 'by', 'field': ['timestamp'], 'ascending': True}
        Splitting: {'strategy': 'loo', 'leave_one_num': 2}
        Negative Sampling: {'strategy': 'by', 'distribution': 'uniform', 'by': 1}
20 Jan 20:31    INFO batch_size = [[2048]], shuffle = [True]

20 Jan 20:31    INFO Build [ModelType.SEQUENTIAL] DataLoader for [evaluation] with format [InputType.POINTWISE]
20 Jan 20:31    INFO Evaluation Setting:
        Group by session_id
        Ordering: {'strategy': 'by', 'field': ['timestamp'], 'ascending': True}
        Splitting: {'strategy': 'loo', 'leave_one_num': 2}
        Negative Sampling: {'strategy': 'full', 'distribution': 'uniform'}
20 Jan 20:31    INFO batch_size = [[4096, 4096]], shuffle = [False]

Traceback (most recent call last):
  File "run_recbole.py", line 25, in <module>
    run_recbole(model=args.model, dataset=args.dataset, config_file_list=config_file_list)
  File "C:\Users\Administrator\RecBole\recbole\quick_start\quick_start.py", line 45, in run_recbole
    model = get_model(config['model'])(config, train_data).to(config['device'])
  File "C:\Users\Administrator\RecBole\recbole\model\sequential_recommender\gru4recf.py", line 60, in __init__
    self.pooling_mode, self.device)
  File "C:\Users\Administrator\RecBole\recbole\model\layers.py", line 810, in __init__
    self.get_fields_name_dim()
  File "C:\Users\Administrator\RecBole\recbole\model\layers.py", line 569, in get_fields_name_dim
    if self.dataset.field2type[field_name] == FeatureType.TOKEN:
KeyError: 'class'

(base) C:\Users\Administrator\RecBole>
deklanw commented 3 years ago

The default config for that model tries to embed the class field from ml-100k (i.e., the genre(s)). Sidenote: I don't know why a default config file is specific to ml-100k.

See: https://github.com/RUCAIBox/RecBole/blob/master/recbole/properties/model/GRU4RecF.yaml#L5

Try setting selected_features to something else in your config. Afaik it only supports item features? Based on this line of your log

load_col={'inter': ['session_id', 'item_id', 'timestamp'], 'user': ['session_id', 'PatientLocationID', 'GenderID', 'AgeGroup', 'JobGroup']}

you don't have any item features yet. Need to load some

mayaKaplansky commented 3 years ago

Hi Thank you. My data contains only user features which are actually session features, but no item features. If so, I cannot use GRU4rec nor GRU4recF? I couldn't figure out from the documentation whether there are other sequential models I can use in the RecBole library that do accept user features?

deklanw commented 3 years ago

@mayaKaplansky GRU4Rec (no F) doesn't use any item features, so you could use that. As far as I'm aware GRU4RecF doesn't model the user at all, it only models sequences of (perhaps attributed) item interactions.

Looking through the code, it seems that DIN supports user features https://github.com/RUCAIBox/RecBole/blob/4c4838beac081e6e454d78cf76fb460b5b689413/recbole/model/layers.py#L777-L800

Not aware of any others in RecBole. I'm sure there are many possible algorithms in the literature that could be added, though! https://github.com/RUCAIBox/RecBole/discussions/611

mayaKaplansky commented 3 years ago

Thank you! I need review the paper again and indeed its only item feature. Is there a way to create a sequential recommender with DIN?

deklanw commented 3 years ago

Oops, I thought DIN was sequential. My mistake.

Then, I suppose RecBole has no sequential models which support user features :(

FWIW, it might be worth trying some other models (like contextual) which don't model time, in the temporal evaluation setting. You might be surprised that some can outperform models which do account for time. (At least, I've found this to be the case...)

mayaKaplansky commented 3 years ago

Thanks! How can I figure out from documentation which model supports user features?

batmanfly commented 3 years ago

@mayaKaplansky I supposed you would like to implement session-based recommendation with user features, right?

If so, there is no such model that utilized user features for session-based recommendation in RecBole.

I think I have replied on the issue "Get a prediction" with one possibility: you first learned the session representation (just like GRU4RecF with no user features), and then combine it (the embedding encoding the sequence of the items in a session) with user representations (e.g., sum, concatenation or others. If you have multiple features, you can also design a MLP or more complicated architecture). If you would like to find some models with user features for reference, please refer to context-aware models, e.g., deep & wide (however, it seemed to be not explicitly with user features: it accepted general features, including user features).

For this purpose, you should make two attempts:

1) Try to load user features via .user file and can explicitly use these features in programs. (this is what you ask in a previous issue) 2) If this step is successful, you can design your own architecture for using user features (this can be implemented in full_sort and associated functions, which are discussed in a previous issue, too).