common-fate / glide

Automate permissions to your cloud and critical applications.
https://docs.commonfate.io/common-fate/introduction
GNU Affero General Public License v3.0
236 stars 20 forks source link

Bug: aws-sso provider return 500 internal server error while getting AWS Account ID on organization with lots of accounts #536

Open vincenttjia opened 1 year ago

vincenttjia commented 1 year ago

Hi, AWS SSO provider return 500 internal server error while trying to load an organization with 150+ accounts

The url is GET https://abcdefghij.execute-api.us-east-1.amazonaws.com/prod/api/v1/admin/providers/aws-sso-v2/args/accountId/options

Response {"error":"Internal Server Error"}

Full detail

# Headers
Request URL: https://abcdefghij.execute-api.us-east-1.amazonaws.com/prod/api/v1/admin/providers/aws-sso-v2/args/accountId/options
Request Method: GET
Status Code: 500 
Remote Address: 18.161.49.105:443
Referrer Policy: strict-origin-when-cross-origin
access-control-allow-credentials: true
access-control-allow-origin: https://commonfate.example.com
content-length: 33
content-type: application/json
date: Thu, 02 Mar 2023 11:15:00 GMT
vary: Origin
via: 1.1 117c2191e94ab49ae7a622ef64537c78.cloudfront.net (CloudFront)
x-amz-apigw-id: 
x-amz-cf-id: 
x-amz-cf-pop: CGK50-P1
x-amzn-requestid: 
x-amzn-trace-id: 
x-cache: Error from cloudfront
:authority: abcdefghij.execute-api.us-east-1.amazonaws.com
:method: GET
:path: /prod/api/v1/admin/providers/aws-sso-v2/args/accountId/options
:scheme: https
accept: application/json, text/plain, */*
accept-encoding: gzip, deflate, br
accept-language: en-US,en;q=0.9
authorization: 
origin: https://commonfate.example.com
referer: https://commonfate.example.com/
sec-ch-ua: "Chromium";v="110", "Not A(Brand";v="24", "Brave";v="110"
sec-ch-ua-mobile: ?0
sec-ch-ua-platform: "macOS"
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
sec-gpc: 1
user-agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/110.0.0.0 Safari/537.36

Thank you

vincenttjia commented 1 year ago

I am also finding this in the Cloudwatch log of the lambda

{
    "level": "error",
    "ts": 1677813373.3319323,
    "caller": "apio/response.go:65",
    "msg": "web handler error",
    "reqId": "c05924ce-cc6c-4314-8390-90b337cf7789",
    "error": "Internal Server Error"
}
chrnorm commented 1 year ago

Hey @vincenttjia - thanks for the issue! Could I ask whether you were seeing a timeout on the Lambda function? The specific function to look for is the APICacheSyncHandlerFunction:

image

On a separate note - we are currently reworking our resource syncing pipelines to be friendlier for larger account structures. Feel free to jump into our Community Slack and we can help you debug this.

vincenttjia commented 1 year ago

Hi @chrnorm, I checked the lambda cloudwatch and I'm not seeing any timeout for the lambda. The max execution duration is 13.5 seconds which is way lower than the set 1 minute

I did also try setting the timeout to 10 minutes and memory to 1024 MB to try and debug, but still encounter the issue

Regarding slack I have reach out to you on slack as well let me know if there is anything I can do on my side to help debug this issue. Thank you very much.