RUCAIBox / RecBole

A unified, comprehensive and efficient recommendation library
https://recbole.io/
MIT License
3.27k stars 590 forks source link

[🐛BUG] steam数据集读取出错 #2017

Open yin214 opened 3 months ago

yin214 commented 3 months ago

描述这个 bug 我想在steam数据集上运行FDSA模型,steam.item文件读取错误

如何复现 复现这个 bug 的步骤:

  1. 您引入的额外 yaml 文件
    
    # Basic Information
    USER_ID_FIELD: user_id          # (str) Field name of user ID feature.
    ITEM_ID_FIELD: product_id          # (str) Field name of item ID feature.

user_inter_num_interval: "[5,inf)" item_inter_num_interval: "[5,inf)"

TIME_FIELD: timestamp # (str) Field name of timestamp feature. seq_len: ~ # (dict) Field name of sequence feature: maximum length of each sequence threshold: ~ # (dict) 0/1 labels will be generated according to the pairs. NEGPREFIX: neg # (str) Negative sampling prefix for pair-wise dataLoaders.

Sequential Model Needed

LIST_SUFFIX: _list # (str) Suffix of field names which are generated as sequences. MAX_ITEM_LIST_LENGTH: 50 # (int) Maximum length of each generated sequence.

POSITION_FIELD: position_id # (str) Field name of the generated position sequence.

load_col: # (dict) The suffix of atomic files: (list) field names to be loaded. inter: [user_id, product_id, timestamp] item: [product_id, genres] selected_features: [genres]



**报错内容**
Traceback (most recent call last):
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 795, in _next_iter_line
    line = next(self.data)
_csv.Error: '   ' expected after '"'

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "run_recbole.py", line 46, in <module>
    run(
  File "/home/yxy/RecBole-master/recbole/quick_start/quick_start.py", line 52, in run
    res = run_recbole(
  File "/home/yxy/RecBole-master/recbole/quick_start/quick_start.py", line 129, in run_recbole
    dataset = create_dataset(config)
  File "/home/yxy/RecBole-master/recbole/data/utils.py", line 72, in create_dataset
    dataset = dataset_class(config)
  File "/home/yxy/RecBole-master/recbole/data/dataset/sequential_dataset.py", line 36, in __init__
    super().__init__(config)
  File "/home/yxy/RecBole-master/recbole/data/dataset/dataset.py", line 109, in __init__
    self._from_scratch()
  File "/home/yxy/RecBole-master/recbole/data/dataset/dataset.py", line 119, in _from_scratch
    self._load_data(self.dataset_name, self.dataset_path)
  File "/home/yxy/RecBole-master/recbole/data/dataset/dataset.py", line 273, in _load_data
    self.item_feat = self._load_user_or_item_feat(
  File "/home/yxy/RecBole-master/recbole/data/dataset/dataset.py", line 341, in _load_user_or_item_feat
    feat = self._load_feat(feat_path, source)
  File "/home/yxy/RecBole-master/recbole/data/dataset/dataset.py", line 487, in _load_feat
    df = pd.read_csv(
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 912, in read_csv
    return _read(filepath_or_buffer, kwds)
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 583, in _read
    return parser.read(nrows)
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/readers.py", line 1704, in read
    ) = self._engine.read(  # type: ignore[attr-defined]
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 251, in read
    content = self._get_lines(rows)
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 1130, in _get_lines
    new_row = self._next_iter_line(row_num=self.pos + rows + 1)
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 824, in _next_iter_line
    self._alert_malformed(msg, row_num)
  File "/home/yxy/miniconda3/envs/yxy/lib/python3.8/site-packages/pandas/io/parsers/python_parser.py", line 774, in _alert_malformed
    raise ParserError(msg)
pandas.errors.ParserError: '    ' expected after '"'
yin214 commented 3 months ago

原因是steam.item文件的75行:

        False               19.99