Yelp dataset fails to utilize 'val_interval': {'rating': "[3, inf)"}, and gives
ValueError: Field [rating] not defined in dataset.
I use config file
from recbole.config import Config
from recbole.data.dataset import Dataset
from recbole.data import create_dataset
additional_config = {
'rm_dup_inter': 'first',
'val_interval': {'rating': "[3, inf)"},
'user_inter_num_interval': '[15, inf)',
'item_inter_num_interval': '[15, inf)',
}
config = Config(model='LightGCN', dataset=dataset_name, config_file_list=[])
config['eval_args']['order'] = split_order
config['data_path'] = f'cache_data/{dataset}/raw'
for key, value in additional_config.items():
config[key] = value
dataset = Dataset(config) # dataset = create_dataset(config) # both lines present same error
where dataset_name='yelp' or dataset_name='ml100k'
Expected behavior
A clear and concise description of what you expected to happen.
Traceback (most recent call last):
File "data.py", line 121, in
train_data, valid_data, test_data = prepare_recbole_data('yelp')
File "data.py", line 59, in prepare_recbole_data
dataset = create_dataset(config)
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/utils.py", line 72, in create_dataset
dataset = dataset_class(config)
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 108, in init
self._from_scratch()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 120, in _from_scratch
self._data_processing()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 162, in _data_processing
self._data_filtering()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 187, in _data_filtering
self._filter_by_field_value()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 1041, in _filter_by_field_value
raise ValueError(f"Field [{field}] not defined in dataset.")
ValueError: Field [rating] not defined in dataset.
printing self.field2type at line 1041 shows the following for dataset ='yelp'{'user_id': <FeatureType.TOKEN: 'token'>, 'item_id': <FeatureType.TOKEN: 'token'>}
Desktop (please complete the following information):
OS: Linux
RecBole Version 1.2.0
Python Version 3.10.11
PyTorch Version 3.10.11
cudatoolkit Version 11.8
Side Note
I also have problem with
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
for torch.distributed.barrier() in recbole/data/dataset/dataset.py", line 252, in _download when I run the code for the first time and the datasets are downloaded. What is the minimum fix to prevent this error?
Describe the bug
Yelp dataset fails to utilize
'val_interval': {'rating': "[3, inf)"},
and gives ValueError: Field [rating] not defined in dataset.I use config file
where
dataset_name='yelp'
ordataset_name='ml100k'
Expected behavior A clear and concise description of what you expected to happen.
Traceback (most recent call last): File "data.py", line 121, in
train_data, valid_data, test_data = prepare_recbole_data('yelp')
File "data.py", line 59, in prepare_recbole_data
dataset = create_dataset(config)
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/utils.py", line 72, in create_dataset
dataset = dataset_class(config)
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 108, in init
self._from_scratch()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 120, in _from_scratch
self._data_processing()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 162, in _data_processing
self._data_filtering()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 187, in _data_filtering
self._filter_by_field_value()
File "anaconda3/envs/recbull/lib/python3.10/site-packages/recbole/data/dataset/dataset.py", line 1041, in _filter_by_field_value
raise ValueError(f"Field [{field}] not defined in dataset.")
ValueError: Field [rating] not defined in dataset.
printing self.field2type at line 1041 shows the following for
dataset ='yelp'
{'user_id': <FeatureType.TOKEN: 'token'>, 'item_id': <FeatureType.TOKEN: 'token'>}
Desktop (please complete the following information):
Side Note
I also have problem with
for
torch.distributed.barrier()
inrecbole/data/dataset/dataset.py", line 252, in _download
when I run the code for the first time and the datasets are downloaded. What is the minimum fix to prevent this error?