byroot / pysrt

Python parser for SubRip (srt) files
GNU General Public License v3.0
446 stars 67 forks source link

UnicodeDecodeError #75

Open paranoidi opened 6 years ago

paranoidi commented 6 years ago

Almost ~40% of subtitles fail to parse because of unicode errors.

Traceback (most recent call last):
  File "/home/username/bin/nocc", line 11, in <module>
    load_entry_point('nocc', 'console_scripts', 'nocc')()
  File "/home/username/projects/nocc/nocc/nocc.py", line 155, in main
    nocc(fn)
  File "/home/username/projects/nocc/nocc/nocc.py", line 47, in nocc
    subs = pysrt.open(filename)
  File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 153, in open
    new_file.read(source_file, error_handling=error_handling)
  File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 180, in read
    self.eol = self._guess_eol(source_file)
  File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 257, in _guess_eol
    first_line = cls._get_first_line(string_iterable)
  File "/home/username/.local/venvs/nocc/lib/python3.5/site-packages/pysrt/srtfile.py", line 269, in _get_first_line
    first_line = next(iter(string_iterable))
  File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 711, in __next__
    return next(self.reader)
  File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 642, in __next__
    line = self.readline()
  File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 555, in readline
    data = self.read(readsize, firstline=True)
  File "/home/username/.local/venvs/nocc/lib/python3.5/codecs.py", line 501, in read
    newchars, decodedbytes = self.decode(data, self.errors)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa3 in position 4: invalid start byte

Please enable "errors=ignore" in open()