OpenSlides / openslides-media-service

Media service for OpenSlides 3+
MIT License
0 stars 14 forks source link

Mediafiles: "Internal Server Error" when filename contain special characters #49

Closed alveolus closed 2 years ago

alveolus commented 2 years ago

.../mediafiles/...

Problem Files that contain special characters in the filename, such as an "Ü", cause an "Internal Server Error" when trying to open the file.

Example (exactly the same file, only name changed): Beschlussübersicht.pdf → Internal Server Error Beschlusslage.pdf → file is displayed correctly

Expected behavior Files are displayed correctly regardless of their file name, even if the name contains special characters.

luisa-beerboom commented 2 years ago

What exactly are you doing and what exactly do you see? I tried it out in the demo and it seemed to work just fine. Anyway, if this is throwing an internal server error, where the other one isn't, this is likely a problem with the openslides-media-service, not the client. I'll transfer this to the media service for now.

jsangmeister commented 2 years ago

same for me. @alveolus please provide more details here, best case a step-by-step replication in the demo or in an empty dev instance.

alveolus commented 2 years ago

Just like that. 😉

  1. Upload a file that contains a special character such as ü Ä etc.
  2. Click on the file

→ File is not opened/displayed.

jsangmeister commented 2 years ago

Cannot reproduce this way locally. The demo currently seems completely broken regarding mediafiles - one is always redirected to the start page when trying to access a media file...

jsangmeister commented 2 years ago

Still not reproducible. @alveolus can you provide more details? E.g. the file for which this happens, does this happen for multiple files/instances or only specific ones etc.

alveolus commented 2 years ago

OpenSlides version: 4.0.0-beta-20221005-0e437dd → productive setup

@jsangmeister I tried to describe it as detailed as possible. Can also show you the error via screenshare. Feel free to write me on Jabber. :)

Click by click

peb-adr commented 2 years ago

I can confirm this is still an issue. We get the following traceback from the media service regularly.

media_1             | Traceback (most recent call last):
media_1             |   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 136, in handle
media_1             |     self.handle_request(listener, req, client, addr)
media_1             |   File "/usr/local/lib/python3.8/site-packages/gunicorn/workers/sync.py", line 185, in handle_request
media_1             |     resp.write(item)
media_1             |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 326, in write
media_1             |     self.send_headers()
media_1             |   File "/usr/local/lib/python3.8/site-packages/gunicorn/http/wsgi.py", line 322, in send_headers
media_1             |     util.write(self.sock, util.to_bytestring(header_str, "latin-1"))
media_1             |   File "/usr/local/lib/python3.8/site-packages/gunicorn/util.py", line 565, in to_bytestring
media_1             |     return value.encode(encoding)
media_1             | UnicodeEncodeError: 'latin-1' codec can't encode character '\u2013' in position 199: ordinal not in range(256)

Now I did some digging and I'm pretty sure I found the cause of this problem. It seems to me using a character like 'ü' in a filename is not enough to provoke this behavior but rather use a unicode one that is not part of latin1, in practice I've seen it happen with i.e. '–' or '„' .

When a file is served https://github.com/OpenSlides/openslides-media-service/blob/a95d09b9e98d3c148a9c300bfe4ccca8814970fe/src/mediaserver.py#L68 will attempt to set a HTTP header to the filename. But since will gunicorn will encode HTTP headers in latin1 (also see https://github.com/benoitc/gunicorn/issues/1151) it will fail.

This is also reproducable in the dev setup:

  1. Make a file with a bad name, i.e. echo asd > "bad$(python3 -c 'print("\u2013")')name.txt
  2. upload that text file
  3. try to access it Even flask will throw a comparable traceback running into the same problem.

Now I'm not sure what the 'Content-Disposition' Header is needed for. But I can already confirm, removing that line of code will fix the problem. So I think either we remove this header completely, or we will have to filter out non-latin1 chars before setting the header. @jsangmeister @reiterl thoughts on this?

peb-adr commented 2 years ago

https://github.com/OpenSlides/openslides-media-service/pull/52 fixes this

jsangmeister commented 2 years ago

Yes, good catch! I did not connect the error log to this issue