awslabs / cognito-at-edge

Serverless authentication solution to protect your website or Amplify application
Apache License 2.0
168 stars 54 forks source link

503 ERROR due lambda timeout after tokens are fetched #86

Open mishabruml opened 5 months ago

mishabruml commented 5 months ago

What happened:

My cloudfront dist redirects me to the cognito login UI and I successfully auth. After that, I get directed to the cloudfront 503 error page rather than my s3 static content:

503 ERROR
The request could not be satisfied.
The Lambda function associated with the CloudFront distribution is invalid or doesn't have the required permissions. We can't connect to the server for this app or website at this time. There might be too much traffic or a configuration error. Try again later, or contact the app or website owner.
If you provide content to customers through CloudFront, you can find steps to troubleshoot and help prevent this error by reviewing the CloudFront documentation.

Generated by cloudfront (CloudFront)
Request ID: BO_FjHzTRlnA-HWDVRWznjZbcpOBBMjEAT2mwvZUxje6BITPn2bbJg==

The logs in my lambda@edge appear to show the initial invocation successfully recirecting the user to the UI page:

logs ``` { "level": 20, "time": 1705676579380, "msg": "Handling Lambda@Edge event", "event": { "Records": [ ....big blob redacted here ] } } { "level": 20, "time": 1705676579400, "msg": "Cookies weren't present in the request" } { "level": 20, "time": 1705676579400, "msg": "User isn't authenticated: Error: Cookies weren't present in the request" } { "level": 20, "time": 1705676579400, "msg": "Redirecting user to Cognito User Pool URL https://***********" } ```

Then after entering my login credentials, after 5s I am presented with the 503 screen in the browser. The lambda appears to time out after fetching the tokens.

logs ``` { "level": 20, "time": 1705676947082, "msg": "Handling Lambda@Edge event", "event": { "Records": [...redacted blob] } } { "level": 20, "time": 1705676947102, "msg": "Cookies weren't present in the request" } { "level": 20, "time": 1705676947102, "msg": "User isn't authenticated: Error: Cookies weren't present in the request" } { "level": 20, "time": 1705676947102, "msg": "Fetching tokens from grant code...", "request": { "url": "https://****.auth.****.amazoncognito.com/oauth2/token", "method": "POST", "headers": { "Content-Type": "application/x-www-form-urlencoded" }, "data": "client_id=******&code=******&grant_type=authorization_code&redirect_uri=******.cloudfront.net" }, "code": "******" } { "level": 20, "time": 1705676949484, "msg": "Fetched tokens", "tokens": { "id_token": "*****", "access_token": "****", "refresh_token": "****", "expires_in": 28800, "token_type": "Bearer" } } ```
mishabruml commented 5 months ago

I did some debugging afger forking the repo and adding logs everywhere, and found that the lambda was timing out here https://github.com/awslabs/cognito-at-edge/blob/e8c7e305b2fe87d0fbc47bcde560b4526a0235d2/src/index.ts#L279 verifying the JWT token. Others appeared to have had this issue https://github.com/awslabs/aws-jwt-verify/issues/72 https://github.com/awslabs/aws-jwt-verify/issues/133 so I have made a PR #88 to address this. Changing the timeout to 5000ms fixed the issue for me.

ckifer commented 5 months ago

+1 I have the same issue but only when authenticating with refresh tokens.

Edit: jk mine is more of an infinite redirect issue

wesdek commented 5 months ago

+1 I have the same issue but only when authenticating with refresh tokens.

Edit: jk mine is more of an infinite redirect issue

You need to set your config with: "cookiePath": "/" to fix your redirect issue

ckifer commented 5 months ago

Yep, found the other issue. Thanks!

ckifer commented 5 months ago

Now I'm experiencing this after resolving the other issue. Still +1

2024-01-25T02:28:37.114Z ... Task timed out after 3.06 seconds

This seems to be happening only when refresh tokens are fetched and in regions that are further out (APAC / South America) because of the axios call to cognito in the set region. Is there any way we can do this with redirects instead?

@maverick089 as the contributor of refresh tokens

lenfree commented 5 months ago

Not sure if these would address the 5 second time out issue:

  1. increase the timeout to 5 seconds
  2. enable the AWS_NODEJS_CONNECTION_REUSE_ENABLED
    process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1';
ckifer commented 5 months ago

Increasing the timeout worked for me... for now. Lambda@edge doesn't support timeouts > 5 seconds though. Also doesn't support env vars

lenfree commented 5 months ago
  process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1';

That's right, it does not support Lambda environment variables but you can manually set it with below if I am not mistaken. Cold start might also contribute to the issue.

process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1';
ckifer commented 5 months ago

Can give it a shot. Though I think that it's mostly because of region distance. An edge function in Singapore is making a call to Cognito in Oregon

manu-remsense commented 4 months ago

Hey guys, I was having the same issue, and thankfully I was able to come across this issue and resolve the problem with what @mishabruml has suggested, thanks 🙏 .

ckifer commented 4 months ago

Sweet, thanks for the info!

manu-remsense commented 4 months ago

Ops, actually it didn't help, I tested it wrong. I understand the issue on my end a bit better, and turns out I have the same problem as you @ckifer. When a token is expired it has to verify the token, then verify the refresh token and then fetch the new token - process seems to take longer than 5s time limit when cognito and cloudfront locations are far apart. I still have the timeout increased for the SimpleJsonFetcher in place but it hasn't really resolved it for me. Did you manage to find something that fixes it @ckifer ? Thanks

ckifer commented 4 months ago

No resolution here no. Luckily I don't have too many users that are far away, but still would like to find a better solution.

lenfree commented 4 months ago

I had the same issue too and so far this worked for me, or maybe I just haven't tested it properly.

manu-remsense commented 4 months ago

I had the same issue too and so far this worked for me, or maybe I just haven't tested it properly.

@lenfree I've put that line process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1'; outside of the handler, and it didn't seem to improve the situation. Or did I not do it correctly? Thanks

manu-remsense commented 4 months ago

Anyway, as a band-aid solution we decided to forgo the use of refresh tokens when a user is accessing outside of our region (australia)

Changed this line in handle(event) function if (tokens.refreshToken) To, if (tokens.refreshToken && process.env.AWS_REGION === 'ap-southeast-2') (just in case anyone else is facing the same problem and is out of ideas)

ckifer commented 4 months ago

That's actually a solid workaround, nice!

lenfree commented 4 months ago

I had the same issue too and so far this worked for me, or maybe I just haven't tested it properly.

@lenfree I've put that line process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1'; outside of the handler, and it didn't seem to improve the situation. Or did I not do it correctly? Thanks

I think it needs to be inside the handler, at least that's how I configured it.

ckifer commented 4 months ago

if (tokens.refreshToken && process.env.AWS_REGION === 'ap-southeast-2')

@manu-remsense Is this you modifying the cognito-at-edge code? Wish there was a config item for that...

manu-remsense commented 4 months ago

I had the same issue too and so far this worked for me, or maybe I just haven't tested it properly.

@lenfree I've put that line process.env['AWS_NODEJS_CONNECTION_REUSE_ENABLED'] = '1'; outside of the handler, and it didn't seem to improve the situation. Or did I not do it correctly? Thanks

I think it needs to be inside the handler, at least that's how I configured it.

Ohh I see, I should give it a try 🤔

@ckifer yes that's the only way as far as I know, I can make a PR later when I have time on the weekend (but not sure how useful it would be to people).

ckifer commented 4 months ago

Got it. I can just use patch-package for now. Since this is an issue I believe being able to conditionally turn off the use of refresh tokens should be useful to more than just two of us