Closed tzuhsial closed 2 years ago
@tzuhsial Hello! Thanks for your attention to RecBole! According to your description, I guess this is because you don't provide a configuration file(yaml). When you do not provide a configuration file, we will use the default configuration file, which does not necessarily apply to your task, such as not reading the timestamp field. So you should create a configuration file for the current task with parameter ‘load_col’ to load the timestamp field, just as shown below:
load_col:
inter: ['user_id', 'item_id', 'rating', 'timestamp']
And here is the official document. As for why the ml-100k dataset works normally, this is because we set special default configuration files for it, in which parameter 'load_col' is set. And you can find these default configuration files in recbole/properties/quick_start_config .
Hi @Wicknight, thanks for the fast response!
After setting the load_col parameters, I received another error.
Here's my run.py
from recbole.quick_start import run_recbole
parameter_dict = {
'load_col': ['user_id', 'item_id', 'rating', 'timestamp'],
'eval_args': {'mode': 'uni100', 'distribution': 'uniform', 'order': 'TO'},
'loss_type': 'BPR'
}
run_recbole(model='GRU4Rec', dataset='ml-1m', config_dict=parameter_dict)
And the error
Traceback (most recent call last):
File "run.py", line 8, in <module>
run_recbole(model='GRU4Rec', dataset='ml-1m', config_dict=parameter_dict)
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/quick_start/quick_start.py", line 41, in run_recbole
dataset = create_dataset(config)
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/utils.py", line 41, in create_dataset
return SequentialDataset(config)
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/sequential_dataset.py", line 36, in __init__
super().__init__(config)
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/dataset.py", line 96, in __init__
self._from_scratch()
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/dataset.py", line 108, in _from_scratch
self._data_processing()
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/dataset.py", line 151, in _data_processing
self._data_filtering()
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/dataset.py", line 173, in _data_filtering
self._filter_nan_user_or_item()
File "/local/home/tzuhsial/hoverboard-workspaces/src/RecBole/recbole/data/dataset/dataset.py", line 634, in _filter_nan_user_or_item
dropped_inter = self.inter_feat.index[self.inter_feat[field].isnull()]
AttributeError: 'NoneType' object has no attribute 'index'
Encountered the same issue after preparing my own .inter
file.
Headers: user_id:token item_id:token timestamp:float
Found config error should be specified with inter
Specified
`load_col': {'inter': ['user_id', 'item_id', 'rating', 'timestamp']},
worked for me.
It's likely I didn't find it in the code or documentation, but I would appreciate if the authors could provide a full config. (either python script) or Yaml that includes all existing parameters.
My main friction so far had been with config errors. FYI @Wicknight
Hello! Like your code, parameter configuration can be realized by specifying a dictionary in the code. Also, it can be done by reading an external parameter configuration file. The setting method I mentioned earlier is for the external configuration file. The following example is an external configuration file:
# general
gpu_id: 0
use_gpu: True
seed: 2020
state: INFO
reproducibility: True
data_path: 'dataset/'
checkpoint_dir: 'saved'
show_progress: True
save_dataset: False
save_dataloaders: False
# training settings
epochs: 300
train_batch_size: 2048
learner: adam
learning_rate: 0.001
neg_sampling:
uniform: 1
eval_step: 1
stopping_step: 10
clip_grad_norm: ~
# clip_grad_norm: {'max_norm': 5, 'norm_type': 2}
weight_decay: 0.0
require_pow: False
load_col:
inter: ['user_id', 'item_id', 'rating', 'timestamp']
# evaluation settings
eval_args:
split: {'RS':[0.8,0.1,0.1]}
group_by: user
order: RO
mode: full
repeatable: False
metrics: ["Recall","MRR","NDCG","Hit","Precision"]
topk: [10]
valid_metric: MRR@10
valid_metric_bigger: True
eval_batch_size: 4096
loss_decimal_place: 4
metric_decimal_place: 4
Here I can call it "exam.yaml". And then we can use it through the following code:
config_file_list = ['exam.yaml']
run_recbole(model='GRU4Rec', dataset='ml-1m', config_file_list=config_file_list)
Thanks, YAML configs are great! Let me try them out :)
我按照你上面配置的运行文件后, config_file_list = ['exam.yaml'] run_recbole(model='GRU4Rec', dataset='ml-1m', config_file_list=config_file_list) 一直出现如下错误:
就算是根据csdn出的sequential model快速入门的配置,也会出现一样的错误,请问这是什么呀?求帮帮忙
另外,我还显示如下的错误提示: 我使用的是 ml-100k 和 ml-1m.inter的数据集,两个都会报这样的错误
Describe the bug
Timestamp field not loaded in
ml-1m
dataset.Same command (model, loss_type) works for
ml-100k
so expected just changingml-1m
to workTo Reproduce
recbole.__version__ = '1.0.0'
python run_recbole.py --dataset=ml-1m --model=SASRec --loss_type=BPR
Expected behavior no error, just like
python run_recbole.py --dataset=ml-100k --model=SASRec --loss_type=BPR
Screenshots
Colab Links If applicable, add links to Colab or other Jupyter laboratory platforms that can reproduce the bug.
Desktop (please complete the following information):