brefphp / bref

Serverless PHP on AWS Lambda
https://bref.sh
MIT License
3.15k stars 365 forks source link

Automatically encode base64 #726

Open Nemo64 opened 4 years ago

Nemo64 commented 4 years ago

I currently have to configure binary responses like described here: https://bref.sh/docs/runtimes/http.html#binary-requests-and-responses I'm using the http api and it does not need the apiGateway configuration, the isBase64Encoded option, that the bref runtime sets, is enough.

So... why not enable base64 for all none text/* types by default?

This would need a way of overwriting it, of course, but making it work in most cases by default is definitely preferred imo.

Now with the rest api: I don't have a way to quickly check but is it really required to set the binaryFileTypes option if the proxy integration already defines it as base64? That would foil my plan a bit.

mnapoli commented 4 years ago

I'm using the http api and it does not need the apiGateway configuration,

Oh that's good to know.

Now with the rest api: I don't have a way to quickly check but is it really required to set the binaryFileTypes option if the proxy integration already defines it as base64? That would foil my plan a bit.

Yes we went that route in the past, but it does require to set it, else it doesn't decode base64.

So... why not enable base64 for all none text/* types by default?

We also tried that in the past. That could be a possibility, but using base64 on something that does not need it increases the size of the response by 33% IIRC.

Whitelisting some responses can be more tricky that just text/*, for example there's application/json. And many other variants, here an example of a content-type we forgot by mistake in the past: https://github.com/brefphp/bref/pull/494

But here the consequence of forgetting a content-type wouldn't be so bad, because as you say it wouldn't cause an error, so this might me something we could reconsider doing.

Nemo64 commented 4 years ago

We also tried that in the past. That could be a possibility, but using base64 on something that does not need it increases the size of the response by 33% IIRC.

Does that matter? The size does not increase to the user, it only increases between lambda and api gateway.

Here is another thought. A binary response in the rest api would fail either way. If it is not encoded it'll create a server error and if it is encoded, you get a base64 response. How about detecting binary using mb_detect_encoding($str, 'UTF-8', true) and if it is binary, then sending it using base64. For the rest api, it would have failed either way but for the http api, it would work automatically.

mnapoli commented 4 years ago

I'm not sure if the size increase can affect the Lambda limits.

So to sum up this issue: there are a lot of unknowns. If you manage to test it and check that it works in all scenarios, that sure sounds good ^^ If you try to use mb_detect_encoding it could be good also to make sure it doesn't add too much overhead in response time.

Nemo64 commented 4 years ago

I can think of a way that does not introduce an overhead to the current way.

php > var_dump(json_encode(base64_decode('R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7')));
bool(false)

php > var_dump(json_last_error_msg());
string(56) "Malformed UTF-8 characters, possibly incorrectly encoded"

php > var_dump(json_last_error());
int(5)

So just trying to encode the response, like normal, and then checking for json_last_error() === JSON_ERROR_UTF8 would tell that this was an UTF-8 error. I could then try again with base64. At that point, the request would have already failed with a server error. Doing the additional try for binary can only improve things. If the json encoded fine, then the response is fine too.

mnapoli commented 4 years ago

Ohh I love that idea! That is really great!

georgeboot commented 3 years ago

So what happens when you configure binaryMediaTypes: '*/*' but in the response, set isBase64Encoded to false?

My expectations would be that you would only need the isBase64Encoded on the response, and API Gateway will use that to determine wether it should decode or not.

If that is indeed the case, why not leave the encoding up to the user? For example, set a header on the response (as a flag) to let our event handler know it should encode before returning.

That json_last_error_msg trick will also work nicely, but it feels odd to waste cpu cycles on every request just to check wether something should be encoded or not.

Edit: verified and if you set binaryMediaTypes to */* and you return a response with isBase64Encoded: false, it all still works. API Gateway actually only uses isBase64Encoded and doesn't even support the binaryMediaTypes anymore.

Nemo64 commented 3 years ago

@georgeboot did you verify with the http api or the rest api? Because the binaryMediaTypes option only exists for the rest api. The http api only uses the isBase64Encoded in the response, as the other option doesn't even exist for it.

The point of the json_last_error() === JSON_ERROR_UTF8 trick is to not waste cpu cycles as the json encode has to check in any case. My idea was to encode it, just like now, and then check for that specific error and reencode with base64 only if the plain encode failed. Any plain response would not be affected. This does waste cpu cycles if you only deliver binary responses, true. But it would be the compromise of "it just works" and "it can't break existing rest api users implementations".

If the rest api now suddenly ignores binaryMediaTypes, then this would allow to safely implement some heuristic based on mime types without breaking the rest api code path.

georgeboot commented 3 years ago

Pardon me, I meant HTTP Api.

I've checked and we run a REST api configured with */*. That works fine, both with isBase64Encoded set to false and true.

I agree with your point yeah. The first json encode is only wasted if the response contains binary data and needs to be base64'd first.

If we could provide a way to force the response to base64, we don't waste that first encode, as long as you flag responses which contain binary data.

jiw-mh commented 7 months ago

There could also be a bref:binary header in the response that is removed by bref (not sent to the client) and tells bref to respond as binary. This way the projects could choose the strategy.