cockpit-project / cockpit-files

A Featureful File Browser for Cockpit (Modernized and tested version of https://github.com/45Drives/cockpit-navigator)
GNU Lesser General Public License v2.1
46 stars 26 forks source link

File Download: Cannot download files that have non-latin1 characters #602

Closed leomoty closed 3 months ago

leomoty commented 3 months ago
Uncaught DOMException: Failed to execute 'btoa' on 'Window': The string to be encoded contains characters outside of the Latin1 range.
    at hO (https://redacted/cockpit/@localhost/files/index.js:23:4879)
    at onClick (https://redacted/cockpit/@localhost/files/index.js:23:6254)
    at H (https://redacted/cockpit/@localhost/files/index.js:17:35572)
    at onClick (https://redacted/cockpit/@localhost/files/index.js:17:36736)
    at Object.fL (https://redacted/cockpit/@localhost/files/index.js:5:9867)
    at dL (https://redacted/cockpit/@localhost/files/index.js:5:10021)
    at mL (https://redacted/cockpit/@localhost/files/index.js:5:10078)
    at rw (https://redacted/cockpit/@localhost/files/index.js:5:31460)
    at T0 (https://redacted/cockpit/@localhost/files/index.js:5:31877)
    at https://redacted/cockpit/@localhost/files/index.js:5:36790

I'll try to figure out a reproducer without actually sharing the file name

martinpitt commented 3 months ago

My guess is that it's crashing here. No need to expose the full file name, but it certainly does have some "funky" characters in it?

leomoty commented 3 months ago

I think it is the difference between and -.

This is enough to reproduce it: window.btoa("–")

leomoty commented 3 months ago

My guess is that it's crashing here. No need to expose the full file name, but it certainly does have some "funky" characters in it?

OH and yep, that is exactly where it crashes, when it tries to build the fsread1 payload

leomoty commented 3 months ago

https://developer.mozilla.org/en-US/docs/Glossary/Base64#the_unicode_problem

https://developer.mozilla.org/en-US/docs/Glossary/Base64#converting_arbitrary_binary_data

MDN has some reference on how to workaround it, but that would imply python bridge is aware of that, right?

leomoty commented 3 months ago

@martinpitt, So I was able to get this working by first encoding the string as utf8, and then using the base64 polyfill wrapper that I noticed by checking @allisonkarlitskaya PR rework.

https://github.com/leomoty/cockpit-files/commit/47f91f9b8f5efd5d8ae911def88adab4e66cbccc

note to self: stop taking a look at cockpit while on vacation

allisonkarlitskaya commented 3 months ago

MDN has some reference on how to workaround it, but that would imply python bridge is aware of that, right?

If I understand the problem, this has nothing to do with the bridge. By the time the request gets to the bridge, it's encoded in a JSON document sent over a channel using UTF-8. there should be no issues there. The problem is between us and cockpit-ws, which decodes the URI for external channel requests and converts it into that JSON document.

leomoty commented 3 months ago

MDN has some reference on how to workaround it, but that would imply python bridge is aware of that, right?

If I understand the problem, this has nothing to do with the bridge. By the time the request gets to the bridge, it's encoded in a JSON document sent over a channel using UTF-8. there should be no issues there. The problem is between us and cockpit-ws, which decodes the URI for external channel requests and converts it into that JSON document.

Yep, I misunderstood, but it doesnt have any issues on cockpit-ws side either.

it works with my linked commit, if you guys are willing to accept the TextEncoder(), just gotta add some int tests.

I tested both with the reproducer, and with some japanese characters

allisonkarlitskaya commented 3 months ago

it works with my linked commit, if you guys are willing to accept the TextEncoder(), just gotta add some int tests.

Yes, please open a PR. Tests are always a plus!

Is it not possible to combine TextEncoder with btoa()? It would sure be nice to avoid the base64 APIs...

leomoty commented 3 months ago

Is it not possible to combine TextEncoder with btoa()? It would sure be nice to avoid the base64 APIs...

Directly it doesn't play nice with the UInt8Array, if we apply per byte, seems to get the effect we want:

window.btoa(new TextEncoder().encode("–"))
'MjI2LDEyOCwxNDc='
cockpit.base64_encode(new TextEncoder().encode("–"))
'4oCT'
window.btoa(String.fromCharCode.apply(null, new TextEncoder().encode("–")))
'4oCT'