brendanhay / amazonka

A comprehensive Amazon Web Services SDK for Haskell.
https://amazonka.brendanhay.nz
Other
599 stars 227 forks source link

When the payload is a non-printable bytestring, printing the request terminates the app #947

Closed korayal closed 1 year ago

korayal commented 1 year ago

Recently we found out that PutMetricData endpoint supports gzipped request bodies.

So, to be able to make use of that, I came up with this wrapper:

import qualified Amazonka
import qualified Amazonka.CloudWatch as CW
import qualified Amazonka.CloudWatch.Lens as CW
import qualified Amazonka.Data as AWS
import qualified Amazonka.Request as AWSRequest
import qualified Amazonka.Response as AWS
import qualified Amazonka.Types (AWSRequest, Request (..))
import qualified Codec.Compression.GZip as GZip

newtype GzippedPutMetricData = GzippedPutMetricData CW.PutMetricData
  deriving newtype (AWS.ToPath, AWS.ToQuery, AWS.ToHeaders)

instance Amazonka.Types.AWSRequest GzippedPutMetricData where
  type AWSResponse GzippedPutMetricData = CW.PutMetricDataResponse
  request overrides x =
    let req = (AWSRequest.postQuery (overrides CW.defaultService) x :: Amazonka.Types.Request GzippedPutMetricData)
        !bs = BS.fromStrict (AWS.toBS (AWS.toQuery x))
        !gzippedBody = GZip.compress bs
        setContentEncoding = AWS.hdr hContentEncoding "gzip"
     in req
              { Amazonka.Types.body = AWS.toBody gzippedBody,
                Amazonka.Types.headers = setContentEncoding (Amazonka.Types.headers req)
              }
  response = AWS.receiveNull CW.PutMetricDataResponse'

which works nicely. But, when running on debug logging, Amazonka prints payloads in size smaller than 4KBs.

Hence, the above wrapper fails if the compressed payload is smaller than 4KB, and if we're running it with DEBUG logging. The ad-hoc solution I found for this is to not compress the payload if the raw payload size is less than 100KBs:

newtype GzippedPutMetricData = GzippedPutMetricData CW.PutMetricData
  deriving newtype (AWS.ToPath, AWS.ToQuery, AWS.ToHeaders)

instance Amazonka.Types.AWSRequest GzippedPutMetricData where
  type AWSResponse GzippedPutMetricData = CW.PutMetricDataResponse
  request overrides x =
    let req = (AWSRequest.postQuery (overrides CW.defaultService) x :: Amazonka.Types.Request GzippedPutMetricData)
        !bs = BS.fromStrict (AWS.toBS (AWS.toQuery x))
        !gzippedBody = GZip.compress bs
        setContentEncoding = AWS.hdr hContentEncoding "gzip"
     in if LBS.length bs <= 100000
          then req {Amazonka.Types.body = AWS.toBody bs}
          else
            req
              { Amazonka.Types.body = AWS.toBody gzippedBody,
                Amazonka.Types.headers = setContentEncoding (Amazonka.Types.headers req)
              }
  response = AWS.receiveNull CW.PutMetricDataResponse'

Is there a way to make the ToLog instance of ByteString (or ByteStringLazy) not print contents if it's not printable?

axman6 commented 1 year ago

I don’t really understand what problem you’re seeing, what does “ Hence, the above wrapper fails if the compressed payload is smaller than 4KB” mean? What behaviour are you expecting to see and what are you getting?

Also I would recommend against using the bang pattern on gzippedBody in your fix, I would use

instance Amazonka.Types.AWSRequest GzippedPutMetricData where
  type AWSResponse GzippedPutMetricData = CW.PutMetricDataResponse
  request overrides x =
    let req = (AWSRequest.postQuery (overrides CW.defaultService) x :: Amazonka.Types.Request GzippedPutMetricData)
        !bs = BS.fromStrict (AWS.toBS (AWS.toQuery x))
        gzippedBody = GZip.compress bs
        setContentEncoding = AWS.hdr hContentEncoding "gzip"
     in if LBS.length bs <= 100000
          then req {Amazonka.Types.body = AWS.toBody bs}
          else
            gzippedBody `seq` req
              { Amazonka.Types.body = AWS.toBody gzippedBody,
                Amazonka.Types.headers = setContentEncoding (Amazonka.Types.headers req)
              }
endgame commented 1 year ago

Some thoughts:

  1. The logger is a client of the new hooks system (see Amazonka.Env.Hooks). As a workaround, you could build your own set of logging hooks which refuse to print unprintable characters.
  2. I would probably use a hook to opportunistically compress the PutMetricData instead of cooking up a request newtype. This will save you having to write all the boilerplate about how to handle responses. I'd be interested in having such a hook available in amazonka-cloudwatch or maybe some contrib package. The only problem is that amazonka-cloudwatch is only allowed to depend on amazonka-core and although Hook is a type alias, it's for a function type which takes an Env' as an argument, which is an amazonka type. It might require a separate contrib package, since I can't add such a hook to amazonka without making everyone depend on amazonka-cloudwatch.
  3. If you were to fix the ToLog instances so they don't try and print unprintable junk, I'd be very interested in a PR.
korayal commented 1 year ago

@axman6

yes, that's what we ended up with. I wanted make sure these payloads were being evaluated, because the application was not failing when I didn't run it in DEBUG logging.

What behaviour are you expecting to see and what are you getting?

the current behaviour prints payloads of sizes smaller than 4KBs, but it assumes the payload is printable, when they are not (in my case, they can not since the result is a gzipped binary output) it messes us with the console, and in my case, it causes ghcid to terminate. The expected behavior is to not print the payload, if it's not printable.


@endgame

This will save you having to write all the boilerplate about how to handle responses

I was not aware of the Hook system, so thank you for that insight. I have tried out implementing something like AWS.addHookFor @(Request CW.PutMetricData) which obviously required my hook to return a Request CW.PutMetricData. I don't think it is not possible to modify the request body since it's being derived from the PutMetricData data type. And, code-size-wise I think the above instance was smaller than what I had to do for the hook.

I don't think I'll be able to mutate the request body and return the same Request a since RequestBody for that output is bound to the CW.PutMetricData's AWS.Request instance.

If you were to fix the ToLog instances so they don't try and print unprintable junk

This actually looks like something we can do if we do not care about evaluating the whole payload all the time. Which I guess is not a big deal since it has to be evaluated anyway when the request is being sent. So, I'll take a look a the logger's hook implementation. Thank you!

axman6 commented 1 year ago

It seems to me that reasonable fix would be to call show on the payload in the default logging code - @endgame does that seem reasonable to you? Should escape anything problematic, and still leave the data in a form the user can easily turn back into Haskell. There’s be some overhead, but also I assume anyone doing this sort of logging can accept that.

korayal commented 1 year ago

I was thinking something like this

axman6 commented 1 year ago

Yeah I had a quick look at that - I feel that if so had turned on debug logging, the last thing I’d want to see is my logs telling me I can’t see the data I need to do the debugging. The decision had been made to hate it for larger requests, but I think giving someone the chance to see how they’ve messed up their data is better than frustrating them further. I’m not sure if there’s another way to hook into things so that someone could see exactly what data is being sent, maybe there is.

korayal commented 1 year ago

Not showing the data's content is already the default behavior for payloads bigger than 4KB. If there's an explicit need for the content of the request, I'd probably prefer adding another hook like @endgame suggested. I don't think of any case where one would prefer unreadable characters printed to their console/log. One other option would be to base64-encode the un-printable bytestring.

endgame commented 1 year ago

I was not aware of the Hook system, so thank you for that insight. I have tried out implementing something like AWS.addHookFor @(Request CW.PutMetricData) which obviously required my hook to return a Request CW.PutMetricData. I don't think it is not possible to modify the request body since it's being derived from the PutMetricData data type. And, code-size-wise I think the above instance was smaller than what I had to do for the hook.

I'm not sure that this is right - if you're hooking configuredRequest, then a Hook (Request CW.PutMetricData) should give you a chance to intercept the configured-but-unsigned HTTP request, as this type is a generic HTTP request with a phantom type parameter. You should be able to inject the header you need here and gzip the body - if you are still struggling with this we should discuss further because I'd like to know why it doesn't work.

korayal commented 1 year ago

@endgame you are correct, for some reason I did not add the generic-lens library, which caused the errors I've seen to throw me off. Once I did that, this worked:

gzipRequest :: AWS.Hook (Request CW.PutMetricData)
gzipRequest _env req = pure $
  case req ^. #body of
    AWS.Hashed (AWS.HashedBytes _ hby) ->
      req
        & #headers %~ setHeader
        & #body .~ compress hby
    _ -> req
  where
    setHeader = AWS.hdr hContentEncoding "gzip"
    compress = AWS.toBody . GZip.compress . LBS.fromStrict

with an awsEnv:

let awsEnv = oldEnv & #hooks %~ AWS.configuredRequestHook (AWS.addHookFor @(Request CW.PutMetricData) gzipRequest)
endgame commented 1 year ago

I am surprised that you needed to use a <- with your example, as opposed to a let. But I'm glad that the hooks system works. OK, that means this issue is just about printing stuff, and we can wrap that up with your PR.

korayal commented 1 year ago

yeah, i actually did not use it like that in my version :) I blame myself typing it in late hours. thanks, I've fixed the example.

endgame commented 1 year ago

Merged your fix, thank you. I'm interested to know - were you able to opensource that CW PutMetricData hook? You may also be able to write it without generic-lens, which is probably a kind move for a library to make.

korayal commented 1 year ago

We have started using this in our instruments-cloudwatch library, which is open source. We haven't considered making a new library out of this snippet yet.

endgame commented 1 year ago

That's great, glad to see it's opensource.