aws-samples / eks-workshop-v2

Hands-on labs for Amazon EKS
https://www.eksworkshop.com
Apache License 2.0
447 stars 446 forks source link

[Bug]: eks-workshop-ide fail to launch #1181

Open tobenicer-dev opened 1 week ago

tobenicer-dev commented 1 week ago

Installation method

Own AWS account

What happened?

Going through the workshop setup in my own account: https://www.eksworkshop.com/docs/introduction/setup/your-account

Click each Link on each region, and never touch basic parameter, click "I acknowledge that AWS CloudFormation might create IAM resources with custom names." then, Click "Create stack".

In Region: ap-southeast-1: Cloud9 is not served for new user. In Region: us-west2, eu-west-1 -> In Cloudfoundation - stack image

The following resource(s) failed to create: [EksWorkshopIdeBootstrapInstanceLambda]. Rollback requested by user.

Received response status [FAILED] from custom resource. Message returned: See the details in CloudWatch Log Stream: 2024/11/18/[$LATEST]30260c82caea422889de5bbce7XXXXXX (RequestId: 6ee2351f-f53d-4c8c-b66f-XXXXXXXXXXXX)

In Cloudwatch image

{ "Status": "FAILED", "Reason": "See the details in CloudWatch Log Stream: 2024/11/18/[$LATEST]30260c82caea422889de5bbce7853948", "PhysicalResourceId": "CustomResourcePhysicalID", "StackId": "arn:aws:cloudformation:eu-west-1:503561435592:stack/eks-workshop-ide/31567c90-a56b-11ef-a153-06aa65ac65f3", "RequestId": "6ee2351f-f53d-4c8c-b66f-22e36517ef6c", "LogicalResourceId": "EksWorkshopIdeBootstrapInstanceLambda", "NoEcho": false, "Data": {} }

Status code: 200

LAMBDA_WARNING: Unhandled exception. The most likely cause is an issue in the function code. However, in rare cases, a Lambda runtime update can cause unexpected function behavior. For functions using managed runtimes, runtime updates can be triggered by a function change, or can be applied automatically. To determine if the runtime has been updated, check the runtime version in the INIT_START log entry. If this error correlates with a change in the runtime version, you may be able to mitigate this error by temporarily rolling back to the previous runtime version. For more information, see https://docs.aws.amazon.com/lambda/latest/dg/runtimes-update.html

[ERROR] TypeError: '>=' not supported between instances of 'WaiterError' and 'int' Traceback (most recent call last): File "/var/task/index.py", line 56, in lambda_handler responseData = {'Error': traceback.format_exc(e)} File "/var/lang/lib/python3.12/traceback.py", line 184, in format_exc return "".join(format_exception(sys.exception(), limit=limit, chain=chain)) File "/var/lang/lib/python3.12/traceback.py", line 139, in format_exception te = TracebackException(type(value), value, tb, limit=limit, compact=True) File "/var/lang/lib/python3.12/traceback.py", line 733, in init self.stack = StackSummary._extract_from_extended_frame_gen( File "/var/lang/lib/python3.12/traceback.py", line 411, in _extract_from_extended_frame_gen if limit >= 0:

What did you expect to happen?

Init Cloud9

How can we reproduce it?

Follow step in https://www.eksworkshop.com/docs/introduction/setup/your-account/

Anything else we need to know?

No response

EKS version

NA

matekuzdi commented 1 week ago

I also run in to this problem, and debugged it, the problem was in the SSM document that this Lambda is starting. The document was using a deprecated npm package "argo2-cli" which is not supported anymore and gives an error

The official template url can be downloaded from here: https://ws-assets-prod-iad-r-dub-85e3be25bd827406.s3.eu-west-1.amazonaws.com/39146514-f6d5-41cb-86ef-359f9d2f7265/eks-workshop-vscode-cfn.yaml

The following fix solved the issue

  1. add "argon2" to the yum install section
    yum install -y git argon2 tar gzip vim nodejs npm make gcc g++
  2. Change the code at around the 261.line to this
    HASHED_PASSWORD=$(echo -n "$IDE_PASSWORD" | argon2 saltItWithSalt -l 32 -e)

AWS should really fix it btw....

niallthomson commented 1 week ago

The above is not quite correct. We debugged this last week and the issue was actually due to some sort of kernel issue in the latest AL2023 AMI release https://github.com/amazonlinux/amazon-linux-2023/issues/840

The reason this is failing in your accounts is this fix wasn't sync'ed with the self-service CloudFormation template, this has been fixed.

However, we do probably still want to switch away from that package. I'll look to merge that in a couple of weeks after re:Invent. Thank you for taking the time to look in to it and raise the PR.