OSC / ood_core

Open OnDemand core library
https://osc.github.io/ood_core/
MIT License
10 stars 28 forks source link

File-Editor: Can't open files with multi-byte UTF-8 characters #671

Closed sync-by-unito[bot] closed 2 years ago

sync-by-unito[bot] commented 2 years ago

I am unable to open files whether from the FileExplorer or from this internal file explorer that have funky characters in the file name:

image

┆Issue is synchronized with this Asana task by Unito

sync-by-unito[bot] commented 2 years ago

➤ Jeremy Nicklas commented:

The viewer in FileExplorer works though. So this only seems to affect opening files in FileEditor.

sync-by-unito[bot] commented 2 years ago

➤ Brian L. McMichael commented:

This one is turning out to be a little sticky.

Using the example: (ノ°Д°)ノ︵ ┻━┻ is actually causing problems on the Rails end.

On the system, ) is actually a three-byte char represented by %EF%BC%89, but the ruby renderer is writing it out as a standard close paren ), which doesn't get escaped.

Likewise ︵ is represented by %EF%B8%B5 and it's being rendered as (

These are valid characters on the filesystem and I can open the files in the File Explorer viewer, but because rails (or maybe the browser) is making some odd substitutions as or before it gets rendered out, it's not encoding properly prior to getting passed to the API.

I'll need to pinpoint exactly where the substitution is happening and enforce the proper encoding.

sync-by-unito[bot] commented 2 years ago

➤ Brian L. McMichael commented:

The problem seems to be coming from the OodAppkit files api generator. There may be a setting required for Addressible gem to handle this appropriately. irb(main):002:0> p = Pathname.new '(ノ°Д°)ノ︵ ┻━┻' => # irb(main):003:0> o = OodAppkit.files.api(path: p).to_s => "/pun/sys/files/api/v1/fs(%E3%83%8E%C2%B0%D0%94%C2%B0)%E3%83%8E( %E2%94%BB%E2%94%81%E2%94%BB" irb(main):005:0> p.to_s => "(ノ°Д°)ノ︵ ┻━┻"
sync-by-unito[bot] commented 2 years ago

➤ Brian L. McMichael commented:

Appears to affect Addressible 2.5.1 Addressable::URI.parse('http://www.google.com/(╯°□°)╯︵ ( http://www.google.com/(╯°□°)╯︵ ) ┻━┻').normalize => #
sync-by-unito[bot] commented 2 years ago

➤ Brian L. McMichael commented:

The problem with Addressable seems intentional. > It's doing the right thing actually. IRIs (unicode-friendly URIs) use unicode normalization form KC to limit phishing. NFKC tends to do perceptual codepoint conversions, like converting '?' to '?'. The solution here is not to normalize the URI if this is causing a problem, or to instead normalize components piecemeal. "http://foo.com/blah%ef%bc%9f ( http://foo.com/blah%ef%bc%9f )" and "http://foo.com/blah%3F ( http://foo.com/blah%3F )" are considered equivalent. https://github.com/sporkmonger/addressable/issues/8#issuecomment-26674048 ( https://github.com/sporkmonger/addressable/issues/8#issuecomment-26674048 ) nickjer is the .normalize call in OodAppkit necessary? irb(main):003:0> p = Pathname.new "http://www.google.com/(╯°□°)╯︵ ( http://www.google.com/(╯°□°)╯︵ ) ┻━┻" => # irb(main):006:0> v = URI.encode p.to_s => "http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB ( http://www.google.com/(%E2%95%AF%C2%B0%E2%96%A1%C2%B0%EF%BC%89%E2%95%AF%EF%B8%B5 %E2%94%BB%E2%94%81%E2%94%BB )" irb(main):008:0> g = Addressable::URI.parse v => # irb(main):009:0> g.normalize => #
sync-by-unito[bot] commented 2 years ago

➤ Jeremy Nicklas commented:

I say we punt on this and bring up a more url-friendly encoding for file paths that all of our apps implement for ingest.

An example being: https://ruby-doc.org/stdlib-2.2.0/libdoc/base64/rdoc/Base64.html

in particular the urlsafe option.