fsufitch / movie-meme-generator

MIT License
5 stars 1 forks source link

Don't die horribly on non-ASCII characters #2

Open creffett opened 5 years ago

creffett commented 5 years ago

An SRT file containing non-ASCII characters will often trigger an exception when it's chosen for a meme, as seen in the following backtrace:

2019-03-18 21:20:08,731 [INFO] Processing request using workdir f<TemporaryDirectory '/tmp/tmpqaadyizo'>
[2019-03-18 21:20:08,733] ERROR in app: Exception on /meme [GET]
Traceback (most recent call last):
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/app.py", line 2292, in wsgi_app
    response = self.full_dispatch_request()
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/app.py", line 1815, in full_dispatch_request
    rv = self.handle_user_exception(e)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/app.py", line 1718, in handle_user_exception
    reraise(exc_type, exc_value, tb)
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/_compat.py", line 35, in reraise
    raise value
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "./webapp.py", line 39, in meme
    timestamp = pick_timestamp(context)
  File "./moviememes/timestamp.py", line 25, in pick_timestamp
    srt_data = f.read()
  File "/root/.local/share/virtualenvs/app-4PlAip0Q/lib/python3.7/codecs.py", line 322, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 9189: invalid continuation byte
[pid: 1|app: 0|req: 1/1] 192.168.0.146 () {28 vars in 346 bytes} [Mon Mar 18 21:20:08 2019] GET /meme => generated 291 bytes in 384 msecs (HTTP/1.1 500) 2 headers in 84 bytes (1 switches on core 0)
fsufitch commented 5 years ago

This is not a matter of non-ASCII, it's a matter of non-UTF8:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe0 in position 9189: invalid continuation byte

Wikipedia says:

SubRip's default output encoding is configured as Windows-1252. However, output options are also given for many Windows code pages as well Unicode encodings, such as UTF-8 and UTF-16, with or without Byte Order Mark (BOM). Therefore, there's no de facto character encoding standard for .srt files, which means that any SubRip file parser must attempt to use Charset detection. Unicode Byte Order Mark (BOM) are typically used to aid detection.

Which means that as much as I want to close this as "won't fix, git gud and use UTF-8", I shouldn't. I'll look into fixing this at some point, maybe. In the meantime, I suggest just converting your file to UTF-8.