fog / fog-google

Fog for Google Cloud Platform
MIT License
99 stars 146 forks source link

Issue with escaped URLs with plus sign #590

Closed ihollander closed 1 year ago

ihollander commented 1 year ago

We ran into an issue after updating to the latest version of fog-google where filenames with a + character in them are escaped differently in URLs than they were prior to updating.

Prior to https://github.com/fog/fog-google/commit/67c34c3934a5cb0b8063d2bd9f0d496ab16d0e1a, when generating a URL using fog-google, given a file with a path like this, we'd end up with a URL like this (we use fog-google via Carrierwave):

model.uploaded_file.path
# => "path/to/test_upload+.pdf"
model.uploaded_file.url
# => "https://storage.googleapis.com/bucket/path/to/test_upload+.pdf?GoogleAccessId=..."

After updating fog-google, the same file has + signs in its path encoded as %2B:

model.uploaded_file.path
# => "path/to/test_upload+.pdf"
model.uploaded_file.url
# => "https://storage.googleapis.com/bucket/path/to/test_upload%2B.pdf?GoogleAccessId=..."

It seems this change causes issues when we hand off this URL to ngnix, which we use to proxy requests for files in our GCP buckets:

def stream_from_cloud_storage(url)
  headers['X-Accel-Redirect'] = url
  # ...
end

When this happens, the behavior I see is that nginx handles these redirect correctly for URLs without %2B in the path, but doesn't handle this correctly when the URL has %2B in the path (other encoded characters work fine - it's really just + -> %2B that causes issues).

It's not totally clear to me what is causing this issue with nginx redirects specifically (I think it may be changing the path in the URI and causing the signature in the query params to not match the path), but the only solution I could find that lets us preserve our current nginx setup is to stop fog-google from escaping + signs: https://github.com/ihollander/fog-google/commit/68b7d699979af6a0510f9abe0ad3aaf60ecc2d3d

I understand the change to remove the old implementation of URI.escape in https://github.com/fog/fog-google/commit/67c34c3934a5cb0b8063d2bd9f0d496ab16d0e1a is necessary because URI.escape is obsolete, but there was another approach to this discussed in https://github.com/fog/fog-google/pull/517 using the Addressable library, which may be more appropriate for handling these URIs than the CGI.escape-inspired approach (and gives results are more similar to the old URI.escape approach). For comparison, here's how certain characters are escaped by each of these approaches:

# adapted from https://gist.github.com/rcarver/66498
require "addressable"
require "cgi"
require "uri"

puts ["i", "i.chr", "URI", "fog", "CGI", "Addressable"].map(&:inspect).join("\t")
(0..255).each do |i|
  chr = i.chr(Encoding::UTF_8)
  uri = URI.escape(chr)
  fog = chr.b.gsub(/([^a-zA-Z0-9_.\-~]+)/) { |m|
    '%' + m.unpack('H2' * m.bytesize).join('%').upcase
  }.gsub("%2F", "/")
  cgi = CGI.escape(chr)
  add = Addressable::URI.encode_component(chr, Addressable::URI::CharacterClasses::PATH)
  puts [i, chr, uri, fog, cgi, add].map(&:inspect).join("\t") if fog != uri || cgi != uri || add != uri
end

# "i"     "i.chr" "URI"   "fog"   "CGI"   "Addressable"
# 32      " "     "%20"   "%20"   "+"     "%20"
# 33      "!"     "!"     "%21"   "%21"   "!"
# 36      "$"     "$"     "%24"   "%24"   "$"
# 38      "&"     "&"     "%26"   "%26"   "&"
# 39      "'"     "'"     "%27"   "%27"   "'"
# 40      "("     "("     "%28"   "%28"   "("
# 41      ")"     ")"     "%29"   "%29"   ")"
# 42      "*"     "*"     "%2A"   "%2A"   "*"
# 43      "+"     "+"     "%2B"   "%2B"   "+"
# 44      ","     ","     "%2C"   "%2C"   ","
# 47      "/"     "/"     "/"     "%2F"   "/"
# 58      ":"     ":"     "%3A"   "%3A"   ":"
# 59      ";"     ";"     "%3B"   "%3B"   ";"
# 61      "="     "="     "%3D"   "%3D"   "="
# 63      "?"     "?"     "%3F"   "%3F"   "%3F"
# 64      "@"     "@"     "%40"   "%40"   "@"
# 91      "["     "["     "%5B"   "%5B"   "%5B"
# 93      "]"     "]"     "%5D"   "%5D"   "%5D"

Is there any particular reason https://github.com/fog/fog-google/pull/517 wasn't merged and the CGI.escape approach was used instead?

mobilutz commented 1 year ago

Thanks @ihollander

I just stumbled over the same issue just now. I will see to use your fork, but would also like to see this changed in the original repo.

@stanhu I can say that the issue was not resolved in https://github.com/fog/fog-google/commit/67c34c3934a5cb0b8063d2bd9f0d496ab16d0e1a because I still see the issue with fog-google in the current master "version"

mobilutz commented 1 year ago

FYI: There is also an error with Umlaute like ä etc. when they are part of the filename.

I will investigate, and try to provide a PullRequest for this as well.

stanhu commented 1 year ago

@ihollander @mobilutz Yeah, I think I ran into something similar when trying to use Google CDN. I think by default NGINX will unescape characters.

I think we should revert https://github.com/fog/fog-google/commit/67c34c3934a5cb0b8063d2bd9f0d496ab16d0e1a and use https://github.com/fog/fog-google/pull/517.

geemus commented 1 year ago

WRT #517, I think it was just oversight mostly. I think I felt like I needed to fix the issue and didn't know there was something already in-progress, so I just ported what had seemed good-enough elsewhere.

mobilutz commented 1 year ago

@stanhu @geemus I unfortunately have to report, that even with fog-google v1.21.0 I have problems with uploading files where the filename contains a +.

I am still trying to mitigate the problem in my application, and once I am done with that I will try to create a replicatable application.

geemus commented 1 year ago

@mobilutz sorry to hear it, thanks for the heads up. Just let us know and we can dig in again.

Temikus commented 1 year ago

Yep, definitely can take a look. Just quickly thinking about it - could it be a case of you using the Storage Google XML driver? Maybe old credentials left somewhere and defaulting to it?

If not - no worries - just send a repro and we'll take a look!

stanhu commented 2 months ago

FYI, I was able to verify uploads and downloads work with + and other symbols still work with fog v1.24.1. I'm not sure what issues you're having.