brefphp / bref

Serverless PHP on AWS Lambda
https://bref.sh
MIT License
3.06k stars 364 forks source link

Random FastCgiCommunicationFailed errors when using php-fpm layer #316

Closed kwn closed 5 years ago

kwn commented 5 years ago

Hi @mnapoli !

We tried to use bref and php-fpm layer but we find it very unstable. We tried to make it working with Symfony4, ApiPlatform as well as bare minimal code that sends one header echoes a response. In any case (running locally using sam and in aws environment) we observed very strange and non deterministic behaviours. Instead of getting a real response we were getting 502 response code from Sam with a response of:

{
"message": "Internal server error"
}

The console running lambda using Sam was printing the following output:

2019-04-24 12:35:12 Invoking index.php (provided)
2019-04-24 12:35:12 arn:aws:lambda:eu-west-1:209497400698:layer:php-73-fpm:5 is already cached. Skipping download
2019-04-24 12:35:12 Requested to skip pulling images ...

2019-04-24 12:35:12 Mounting /Users/kwnuk/Repositories/thirdbridge/bref as /var/task:ro,delegated inside runtime container
START RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72 Version: $LATEST
Fatal error: Uncaught Bref\Runtime\FastCgiCommunicationFailed: Error communicating with PHP-FPM to read the HTTP response. A common root cause of this can be that the Lambda (or PHP) timed out, for example when trying to connect to a remote API or database, if this happens continuously check for those! Bref will reconnect to PHP-FPM to clean things up. Original exception message: Hoa\Fastcgi\Exception\Exception Bad request (not a well-formed FastCGI request). in /var/task/vendor/mnapoli/bref/src/Runtime/PhpFpm.php:122
Stack trace:
#0 /opt/bootstrap(30): Bref\Runtime\PhpFpm->proxy(Array)
#1 /var/task/vendor/mnapoli/bref/src/Runtime/LambdaRuntime.php(85): {closure}(Array)
#2 /opt/bootstrap(31): Bref\Runtime\LambdaRuntime->processNextEvent(Object(Closure))
#3 {main}END RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72
REPORT RequestId: 52fdfc07-2182-154f-163f-5f0f9a621d72  Init Duration: 166.87 ms    Duration: 46.38 ms  Billed Duration: 100 ms Memory Size: 1024 MB    Max Memory Used: 42 MB
{
  "errorType": "Bref\\Runtime\\FastCgiCommunicationFailed",
  "errorMessage": "Error communicating with PHP-FPM to read the HTTP response. A common root cause of this can be that the Lambda (or PHP) timed out, for example when trying to connect to a remote API or database, if this happens continuously check for those! Bref will reconnect to PHP-FPM to clean things up. Original exception message: Hoa\\Fastcgi\\Exception\\Exception Bad request (not a well-formed FastCGI request).",
  "stackTrace": [
    "#0 /opt/bootstrap(30): Bref\\Runtime\\PhpFpm-\u003eproxy(Array)",
    "#1 /var/task/vendor/mnapoli/bref/src/Runtime/LambdaRuntime.php(85): {closure}(Array)",
    "#2 /opt/bootstrap(31): Bref\\Runtime\\LambdaRuntime-\u003eprocessNextEvent(Object(Closure))",
    "#3 {main}"
  ]
}
2019-04-24 12:35:14 Function returned an invalid response (must include one of: body, headers or statusCode in the response object). Response received:
2019-04-24 12:35:14 127.0.0.1 - - [24/Apr/2019 12:35:14] "GET / HTTP/1.1" 502 -

Another weird behaviour we observed was cutting the json response in a middle and adding [] } at the end of it (sill not being able to produce correct json object). An example output looked like that:

{"foo": "bar", "fizz": "bizz", "someth []}

These were responses that were echoed directly. We didn't use database, or anything that could potentially delay the response or cause any timeout. We were getting errored responses within <1sec.

The minimal repo that allows to reproduce an issue is available here:

https://github.com/third-bridge/bref-example

Do you have any clue why is it failing? It's almost a default configuration generated by bref. We tried with different layers (v4 and v5, php-fpm 7.2 and 7.3) still getting the same results.

mnapoli commented 5 years ago

Thank you for the details and the repository. This is indeed weird and definitely something we want to fix.

I tried reproducing the error but it is all working fine (at least locally) for me with your test repository.

We are discussing that in Slack (https://bref.sh/docs/community.html), feel free to post here or join in the chat.

mnapoli commented 5 years ago

A quick note: could it be a matter of response size? I see in your test repository that you return a pretty large JSON response.

kwn commented 5 years ago

Thank you for the answer.

Could you try to reproduce the issue locally by calling the endpoint multiple times? I observed it's failing randomly (roughly 1 per 4 requests). Also I noticed it's failing more often under heavier load (processing e.g. 2 or 3 requests in the same time. The behaviour is exactly the same in real AWS environment.

Response size - possibly. Just tried to reduce the size of a response to ~100kB. I'm still able to reproduce an issue but now it's crashing once every ~50 requests.

We spent some time trying to understand why does that happen. Looking into e.g. UTF/weird characters in the response, missing headers etc, but we still have no clue why is it failing. The non-deterministic nature of the issue suggests that we should look into some timeout related problems - but definitely hard to say what's going on.

The execution time itself and the init duration I think is not an issue. Some examples:

failed - Init Duration: 220.98 ms   Duration: 67.66 ms
passed - Init Duration: 132.76 ms   Duration: 27.27 ms
passed - Init Duration: 322.23 ms   Duration: 60.90 ms
passed - Init Duration: 465.60 ms   Duration: 144.71 ms
mnapoli commented 5 years ago

@kwn I have opened #318 (to reproduce) and #319 (to fix).

If you have a chance to try #319 and confirm if it solves the issues for you that would be awesome.

kwn commented 5 years ago

Sure, thank you. I'm going to take a look at it tomorrow.

gonzalovilaseca commented 5 years ago

@mnapoli I just tested with https://github.com/mnapoli/bref/pull/319 and it works like a charm đź‘Ť

kwn commented 5 years ago

Just tested locally in my machine and also works for me (didn't try in real AWS though).

kwn commented 5 years ago

Thank you!!!!!!! @mnapoli

nucleoswsdev commented 2 years ago

Hi @mnapoli, We are getting the same error what you have mentioned in this issue. We are using serverless template to deploy Lambda function in aws. we followed this tutorial to deploy laravel code to lambda: https://www.youtube.com/watch?v=rWExnXzUBqc

we got this error from cloud watch:


"errorType": "Bref\\Event\\Http\\FastCgi\\FastCgiCommunicationFailed",
    "errorMessage": "Error communicating with PHP-FPM to read the HTTP response. A root cause of this can be that the Lambda (or PHP) timed out, for example when trying to connect to a remote API or database, if this happens continuously check for those! Original exception message: hollodotme\\FastCGI\\Exceptions\\TimedoutException Read timed out",
    "stack": [
        "#0 /var/task/vendor/bref/bref/src/Event/Http/HttpHandler.php(22): Bref\\Event\\Http\\FpmHandler->handleRequest(Object(Bref\\Event\\Http\\HttpRequestEvent), Object(Bref\\Context\\Context))",
        "#1 /var/task/vendor/bref/bref/src/Runtime/LambdaRuntime.php(104): Bref\\Event\\Http\\HttpHandler->handle(Array, Object(Bref\\Context\\Context))",
        "#2 /opt/bootstrap(35): Bref\\Runtime\\LambdaRuntime->processNextEvent(Object(Bref\\Event\\Http\\FpmHandler))",
        "#3 {main}"
    ]

We don't know how to Migrate from hoa/fastcgi to hollodotme/fast-cgi-client we did not find any packages like fastcgi or any other packages related to what you have mentioned in the command section.

I have attached serverless template here. my code:

service: wtm-authentication-prod
provider:
  name: aws
  stage: prod
  apiGateway:
    restApiId: xxxxxx
    restApiRootResourceId: xxxxxxx  
  region: ap-south-1
  runtime:
    provided
    # declare one of the following...
  role: arn:aws:iam::xxxxx:role/LaravelAPIRole # must validly reference a role defined in your account
plugins:
  - ./vendor/bref/bref
package:
  exclude:
    - node_modules/**
    - public/storage
    - resources/assets/**
    - tests/**
functions:
  website:
    handler: public/index.php
    timeout: 28 # in seconds (API Gateway has a timeout of 29 seconds)
    layers:
      - $**{bref:layer.php-73-fpm}**
    events:
      - http:
          path: /authservices/api/login
          method: "POST"
          cors: true # <-- CORS!
mnapoli commented 2 years ago

This issue is from 2019, please open a new one as this is very likely not related.

You don't have to install anything new. Judging by serverless.yml I guess you are not using the latest versions of Bref, you may want to upgrade Bref first.