RUCAIBox / RecSysDatasets

This is a repository of public data sources for Recommender Systems (RS).
https://recbole.io/
844 stars 132 forks source link

Codec error when converting movie lens dataset #94

Open guedes-joaofelipe opened 3 years ago

guedes-joaofelipe commented 3 years ago

I followed the instructions on Readme.md to download and convert the movie lens dataset but I got the following error:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe9 in position 2892: invalid continuation byte

Just changed the _pd.readcsv method on file _convertion_tools/src/extendeddataset.py (line 52) to include an encoding argument and fix the problem.

pd.read_csv(self.item_file, delimiter=self.item_sep, header=None, engine='python', encoding = "ISO-8859-1")

EliverQ commented 3 years ago

Hi, @guedes-joaofelipe! Thank you for your issue, but we can't reproduce the problem here. So could you please check your dataset and your environment again?

ZZZZZZZZeng commented 1 year ago

I had the same problem.

ZZZZZZZZeng commented 1 year ago

@EliverQ I had the same problem,When I convert the yelp data set on windows。

Traceback (most recent call last): File "run.py", line 40, in datasets.convert_inter() File "D:\学业\研究生\数据集\数据集转换程序\RecSysDatasets-master\conversion_tools\src\extended_dataset.py", line 4581, in convertinter for in fin: UnicodeDecodeError: 'gbk' codec can't decode byte 0x8b in position 1909: illegal multibyte sequence