brendanhay / gogol

A comprehensive Google Services SDK for Haskell.
Other
281 stars 105 forks source link

Blob download does not use url escaping, so slashes in object names causes failures #87

Open mgsloan opened 6 years ago

mgsloan commented 6 years ago

Here is a repro:

#!/usr/bin/env stack
-- stack script --resolver lts-9.17
--   --package gogol --package gogol-storage
--   --package lens --package servant

{-# LANGUAGE OverloadedStrings #-}

import Control.Lens
import Control.Monad
import Control.Monad.IO.Class
import Data.ByteString (ByteString)
import Data.Proxy
import Network.Google
import Network.Google.Storage
import Network.HTTP.Conduit (newManager, tlsManagerSettings)
import Servant.API.ContentTypes
import System.IO
import qualified Data.Text as T

type S = Scopes ObjectsInsert

main :: IO ()
main = do
  manager <- liftIO $ newManager tlsManagerSettings
  credentials <- liftIO $ getApplicationDefault manager
  logger <- newLogger Trace stdout
  env <- newEnvWith credentials logger manager
  runResourceT $ runGoogle env $ do
    insertAndGet "gogol-bug:a-key"
    insertAndGet "gogol-bug:a-key/with-slashes"

insertAndGet :: T.Text -> Google S ()
insertAndGet key = do
  let bucket = "a-bucket"
      body = "a-body" :: ByteString
  let ureq = objectsInsert bucket object' & oiName ?~ key
  upload ureq (toBody (Proxy :: Proxy OctetStream) body)
  let dreq = objectsGet bucket key
  void $ download dreq
  liftIO $ putStrLn $ T.unpack key ++ " download successful"

Put it in a file called gogol-bug.hs. Set the bucket variable to a bucket in your account. Then, chmod u+x gogol-bug.hs. Run ./gogol-bug.hs. Get this as output:

[Client Request] {
  host      = www.googleapis.com:443
  secure    = True
  method    = POST
  timeout   = ResponseTimeoutMicro 70000000
  redirects = 10
  path      = /upload/storage/v1/b/a-bucket/o
  query     = ?name=gogol-bug%3Aa-key&alt=json&uploadType=multipart
  headers   = <REDACTED>
  body      =  <msger:253>
}
[Client Response] {
  status  = 200 OK
  headers = <REDACTED>
}
[Client Request] {
  host      = www.googleapis.com:443
  secure    = True
  method    = GET
  timeout   = ResponseTimeoutMicro 70000000
  redirects = 10
  path      = /storage/v1/b/a-bucket/o/gogol-bug:a-key
  query     = ?alt=media
  headers   = <REDACTED>
  body      = 
}
[Client Response] {
  status  = 200 OK
  headers = <REDACTED>
}
gogol-bug:a-key download successful
[Client Request] {
  host      = www.googleapis.com:443
  secure    = True
  method    = POST
  timeout   = ResponseTimeoutMicro 70000000
  redirects = 10
  path      = /upload/storage/v1/b/a-bucket/o
  query     = ?name=gogol-bug%3Aa-key%2Fwith-slashes&alt=json&uploadType=multipart
  headers   = <REDACTED>
  body      =  <msger:253>
}
[Client Response] {
  status  = 200 OK
  headers = <REDACTED>
}
[Client Request] {
  host      = www.googleapis.com:443
  secure    = True
  method    = GET
  timeout   = ResponseTimeoutMicro 70000000
  redirects = 10
  path      = /storage/v1/b/a-bucket/o/gogol-bug:a-key/with-slashes
  query     = ?alt=media
  headers   = <REDACTED>
  body      = 
}
[Client Response] {
  status  = 404 Not Found
  headers = 
}
gogol-bug.hs: ServiceError (ServiceError' {_serviceId = ServiceId "storage:v1", _serviceStatus = Status {statusCode = 404, statusMessage = "Not Found"}, _serviceHeaders = [("X-GUploader-UploadID",<REDACTED>),("Vary","Origin"),("Vary","X-Origin"),("Content-Type","text/html; charset=UTF-8"),("Date","Thu, 07 Dec 2017 14:43:31 GMT"),("Expires","Thu, 07 Dec 2017 14:43:31 GMT"),("Cache-Control","private, max-age=0"),("Content-Length","9"),("Server","UploadServer"),("Alt-Svc",<REDACTED>)], _serviceBody = Just "Not Found"})

In other words, uploading an object with a slash in the key works, but downloading fails. Specifically, note the path of the request, /storage/v1/b/a-bucket/o/gogol-bug:a-key/with-slashes. The google docs specify that path parts must be url encoded - https://cloud.google.com/storage/docs/json_api/#encoding . However, it appears that they are being substituted verbatim.

Is this a gotcha with servant's Capture for Text?

mgsloan commented 6 years ago

Ah, I see now that the docs mention it

Name of the object. For information about how to URL encode object names to be path safe, see Encoding URI Path Parts.

Shouldn't the API handle this for you? Can supply the current thing as a more raw API. #

bergey commented 5 years ago

I just ran into this, and was about to open an issue. The docs on Encoding URI Path Parts say

Note that encoding is typically handled for you by client libraries, so you can pass the raw object name to them.

mgsloan commented 5 years ago

Yeah, I think the escaping ought to be handled by the library. FWIW, here's the function I wrote for doing the escaping, back when I encountered this issue:

import Data.Text (Text)
import Data.Text.Encoding (encodeUtf8, decodeUtf8With)
import Data.Text.Encoding.Error (lenientDecode)
import qualified Network.HTTP.Types as HTTP

urlEncodeKey :: Text -> Text
urlEncodeKey
  = decodeUtf8With lenientDecode
  . HTTP.urlEncode False
  . encodeUtf8

To use it, Network.Google.Storage.objectsGet bucket (urlEncodeKey key). It should be possible to write a more efficient version that doesn't convert to ByteString and back, but eh