OSC / ondemand

Supercomputing. Seamlessly. Open, Interactive HPC Via the Web
https://openondemand.org/
MIT License
278 stars 104 forks source link

File-Editor: Can't open files with multi-byte UTF-8 characters #250

Open nickjer opened 7 years ago

nickjer commented 7 years ago

I am unable to open files whether from the FileExplorer or from this internal file explorer that have funky characters in the file name:

image

┆Issue is synchronized with this Asana task by Unito

nickjer commented 7 years ago

The viewer in FileExplorer works though. So this only seems to affect opening files in FileEditor.

brianmcmichael commented 7 years ago

This one is turning out to be a little sticky.

Using the example: (ノ°Д°)ノ︵ ┻━┻ is actually causing problems on the Rails end.

On the system, is actually a three-byte char represented by %EF%BC%89, but the ruby renderer is writing it out as a standard close paren ), which doesn't get escaped.

Likewise is represented by %EF%B8%B5 and it's being rendered as (

These are valid characters on the filesystem and I can open the files in the File Explorer viewer, but because rails (or maybe the browser) is making some odd substitutions as or before it gets rendered out, it's not encoding properly prior to getting passed to the API.

I'll need to pinpoint exactly where the substitution is happening and enforce the proper encoding.

brianmcmichael commented 7 years ago

The problem seems to be coming from the OodAppkit files api generator. There may be a setting required for Addressible gem to handle this appropriately.

irb(main):002:0> p = Pathname.new '(ノ°Д°)ノ︵ ┻━┻'
=> #<Pathname:(ノ°Д°)ノ︵ ┻━┻>
irb(main):003:0> o = OodAppkit.files.api(path: p).to_s
=> "/pun/sys/files/api/v1/fs(%E3%83%8E%C2%B0%D0%94%C2%B0)%E3%83%8E(%20%E2%94%BB%E2%94%81%E2%94%BB"
irb(main):005:0> p.to_s
=> "(ノ°Д°)ノ︵ ┻━┻"
brianmcmichael commented 7 years ago

Appears to affect Addressible 2.5.1

Addressable::URI.parse('http://www.google.com/(╯°□°)╯︵ ┻━┻').normalize
=> #<Addressable::URI:0x23b0478 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF(%20%E2%94%BB%E2%94%81%E2%94%BB>
brianmcmichael commented 7 years ago

The problem with Addressable seems intentional.

It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f" and "http://foo.com/blah%3F" are considered equivalent.

https://github.com/sporkmonger/addressable/issues/8#issuecomment-26674048

@nickjer is the .normalize call in OodAppkit necessary?

irb(main):003:0> p = Pathname.new "http://www.google.com/(╯°□°)╯︵ ┻━┻"
=> #<Pathname:http://www.google.com/(╯°□°)╯︵ ┻━┻>
irb(main):006:0> v = URI.encode p.to_s
=> "http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5%20%E2%94%BB%E2%94%81%E2%94%BB"
irb(main):008:0> g = Addressable::URI.parse v
=> #<Addressable::URI:0x24c86f8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5%20%E2%94%BB%E2%94%81%E2%94%BB>
irb(main):009:0> g.normalize
=> #<Addressable::URI:0x24b92e8 URI:http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0)%E2%95%AF(%20%E2%94%BB%E2%94%81%E2%94%BB>
nickjer commented 7 years ago

I say we punt on this and bring up a more url-friendly encoding for file paths that all of our apps implement for ingest.

An example being: https://ruby-doc.org/stdlib-2.2.0/libdoc/base64/rdoc/Base64.html

in particular the urlsafe option.

matt257 commented 4 months ago

reviewed, similar to #254 but distinct