StegSchreck / RatS

Movie Ratings Synchronization with Python
GNU Affero General Public License v3.0
265 stars 31 forks source link

Issue parsing non-ascii characters from CSV #46

Open rhysr opened 6 years ago

rhysr commented 6 years ago

Non-ascii character in movielens csv causes crash

The Naked Gun 33⅓: The Final Insult (1994)

System

Stacktrace

Traceback (most recent call last):
  File "transfer_ratings.py", line 158, in <module>
    main()
  File "transfer_ratings.py", line 65, in main
    execute(args)
  File "transfer_ratings.py", line 118, in execute
    movies = parse_data_from_source(parser)
  File "transfer_ratings.py", line 128, in parse_data_from_source
    movies = parser.parse()
  File "/RatS/RatS/base/base_ratings_parser.py", line 30, in parse
    self._parse_ratings()
  File "/RatS/RatS/movielens/movielens_ratings_parser.py", line 17, in _parse_ratings
    self.movies = self._parse_movies_from_csv(os.path.join(self.exports_folder, self.csv_filename))
  File "/RatS/RatS/base/base_ratings_downloader.py", line 76, in _parse_movies_from_csv
    return [self._convert_csv_row_to_movie(row) for row in reader]
  File "/RatS/RatS/base/base_ratings_downloader.py", line 76, in <listcomp>
    return [self._convert_csv_row_to_movie(row) for row in reader]
  File "/RatS/RatS/movielens/movielens_ratings_parser.py", line 30, in _convert_csv_row_to_movie
    sys.stdout.write(r + '\r\n')
UnicodeEncodeError: 'ascii' codec can't encode character '\u2153' in position 16: ordinal not in range(128)
StegSchreck commented 6 years ago

Hey @rhysr, Thank you for submitting this issue. This error only appears inside the docker container and only in combination with verbose mode (-v). I will have a look at this. But you can continue to use RatS by either omitting the verbose mode inside the docker container or using it without docker.