Closed sontek closed 5 years ago
With the first 2 lines:
Traceback (most recent call last):
File "parser/subtitles_parser.py", line 20, in <module>
parse_file(sub_fname)
File "parser/subtitles_parser.py", line 16, in parse_file
parsed = list(srt.parse(subtitle_file))
File "/Users/sontek/code/LangMonster/venv/lib/python3.6/site-packages/srt.py", line 341, in parse
_raise_if_not_contiguous(srt, expected_start, actual_start)
File "/Users/sontek/code/LangMonster/venv/lib/python3.6/site-packages/srt.py", line 383, in _raise_if_not_contiguous
raise SRTParseError(expected_start, actual_start, unmatched_content)
srt.SRTParseError: Expected contiguous start of match or end of input at char 0, but started at char 109 (unmatched content: '\ufeff00:00:00,000 --> 00:00:04,162\n<font color="#fffa00">Translated By The Community Of WWW.MY-SUBS.COM</font>\n\n\ufeff')
if you remove the first couple of lines so it starts with 1
:
Traceback (most recent call last):
File "parser/subtitles_parser.py", line 20, in <module>
parse_file(sub_fname)
File "parser/subtitles_parser.py", line 16, in parse_file
parsed = list(srt.parse(subtitle_file))
File "/Users/sontek/code/LangMonster/venv/lib/python3.6/site-packages/srt.py", line 341, in parse
_raise_if_not_contiguous(srt, expected_start, actual_start)
File "/Users/sontek/code/LangMonster/venv/lib/python3.6/site-packages/srt.py", line 383, in _raise_if_not_contiguous
raise SRTParseError(expected_start, actual_start, unmatched_content)
srt.SRTParseError: Expected contiguous start of match or end of input at char 0, but started at char 1 (unmatched content: '\ufeff')
Hi!
You're using the wrong encoding for the file -- this isn't a crash, this is an error in the way you've opened your subtitles. You've opened them with (presumably) utf-8
encoding, but they have a BOM signature indicating endianness (in this case \ufeff
). You need to open with the right encoding, probably utf-8-sig
. :-)
Thanks, I'll try that! I have an app using pysrt
and it reads them just fine, so I was confused when I tried to switch and it crashed. I guess they detect the encodings with chardet
and do it all magically
I'm going to release a new version with the workaround in now, that will also permit loading unicode files with BOM without a BOM encoding specified (since it's harmless).
2.1.0 allows BOM without -sig encoding.