OdyseeTeam / player-server

Media server for Odysee
28 stars 12 forks source link

Escape filename in Content-Disposition header #41

Closed stefansundin closed 3 years ago

stefansundin commented 3 years ago

The filename directive should be quoted, as described here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Content-Disposition

This should fix https://github.com/lbryio/lbrytv/issues/232.

Chrome currently handles an unquoted filename directive, although Firefox has issues with it and ends the filename where the first space character is.

I think this may also fix https://github.com/lbryio/lbry-desktop/issues/4763, which is a problem that exists even on Chrome. Here the , character puts an end to the filename in Chrome.

content-disposition: attachment; filename=Osprey - New Vanguard 154 - British Battleships 1939-45 Part 1, Queen Elizabeth and Royal Soverign Classes.pdf

I am sure there is a better/proper way to escape filenames. There's a lot of discussion on StackOverflow regarding this topic, but I think this simple change should improve the vast majority of the current issues. https://stackoverflow.com/questions/7967079/special-characters-in-content-disposition-filename

tzarebczan commented 3 years ago

Thanks so much for the PR, much appreciated! Happy holidays ;)

Can we show you some appreciation?

lyoshenka commented 3 years ago

@stefansundin thanks for this PR. if you wouldn't mind, could you also strip out all " characters in the filename itself?

could you also include a list of browsers that you tested this fix in, and what filenames you used on each? looking at this SO question, adding quotes is only a partial solution. A full solution would be a bit more complicated, but greatly appreciated!

However, we will not let "perfect" be the enemy of "good". We'll merge this PR after we've tested it a bit (might be next week after New Years). Thanks so much.

stefansundin commented 3 years ago

@stefansundin thanks for this PR. if you wouldn't mind, could you also strip out all " characters in the filename itself?

That's a great idea.

could you also include a list of browsers that you tested this fix in, and what filenames you used on each?

I have to admit that the only test I actually did was to make sure go build worked. I don't know exactly what I need to do to run enough things to test it end-to-end in a browser. I will try to find time to read up on this.

stefansundin commented 3 years ago

I was able to test this by running the LBRY desktop client and running the compiled binary with:

./lbrytv-player --upstream-reflector="reflector.lbry.com:5568" --hot-cache-size="10GB"

Then simply directing requests to http://localhost:8080 instead of https://cdn.lbryplayer.xyz. That was enough for me to do some simple testing. I tried to get the LBRY desktop client to use my media server by setting SDK_API_URL=http://localhost:8080, but this did not work. Would be a nice feature though.

I tried to find media on LBRY with quotes in the filename, but didn't find any. So I just set a dummy filename in the code instead. Chrome is already replacing " with _, even if you try to use \", and even when using filename*. Firefox just strips out the ". Anyway, I still opted to strip out any double-quotes, as was suggested.

Content-Disposition: attachment; filename="hello \" world.mp4"; filename*=UTF-8''hello%20%22%20world.mp4

The new commit includes both filename and filename*. In my tests, both Chrome and Firefox handles UTF-8 just fine in filename, but this is apparently undefined behavior in the spec. Including both is probably good for other clients. I used url.PathEscape because I couldn't find a more generic percent-encoding function.

I tried to run the tests too, but failed. I think I updated it in a way that they should be passing. May be good to find a file with more special characters though.

Feel free to update this branch in any way you see fit. Maintainers have write access to it.

lyoshenka commented 3 years ago

In the future, you can test it by going to a url like this: http://localhost:8080/api/v3/streams/free/can-honey-work-better-than-antibiotics/7c95f6e98efee78701fb9aead37e753be153de44/76a086. However your way works too.

I'll take a look at this today.

lyoshenka commented 3 years ago

@stefansundin thanks for your help. I ended up sanitizing the filenames a bit differently. I also added some tests. You can see the whole thing here: https://github.com/lbryio/lbrytv-player/commit/67249b95392200ca4a6dc46991b885475e73a4a5

Also take a look at https://lbry.com/faq/appreciation and let me know if you're interested ;-)

stefansundin commented 3 years ago

Also take a look at https://lbry.com/faq/appreciation and let me know if you're interested ;-)

Yes, I am interested. I sent an email 9 days ago with a link to this PR. Let me know if I need to do something else too.

stefansundin commented 3 years ago

By the way, I am wondering if perhaps the change that was committed may be a bit too eager regarding stripping characters from filename*?

For example, the filename "Bitcoin je scam" - informujú média.mp4 should be usable with all the characters intact in filename*, but now the same sanitation will apply to both filename and filename*, right?

I think it should be safe to pass filename*=UTF-8''%22Bitcoin%20je%20scam%22%20-%20informuj%FA%20m%E9dia.mp4.

Edit: I realize that " are not valid in a filename on Windows/NTFS, but it should work on Linux. I think we can rely on the browser to strip out/replace these characters in these cases.

stefansundin commented 3 years ago

@lyoshenka I was curious if you saw my previous comment? I think it would be good to consider it and I wanted to make sure it didn't get lost, so I am @-mentioning you this time. Thanks!

lyoshenka commented 3 years ago

thanks for the followup, i did not get notified the first time.

you're right that leaving the quotes in filename* would work. i decided to be more restrictive because i feel that unusual characters in filenames make it more complicated on the user side. i think its ok to use simpler filenames. as you said, very few have quotes. if the user really wants the full original name, they can add the quotes back in.

this decision is a matter of taste. neither way is strictly better than the other. i prefer the no-quotes approach.