aws-solutions / distributed-load-testing-on-aws

Distributed Load Testing on AWS
https://aws.amazon.com/solutions/implementations/distributed-load-testing-on-aws/
Other
324 stars 118 forks source link

"Failed to run Fargate tasks." error #184

Closed khiliel closed 2 weeks ago

khiliel commented 2 months ago

So for my use case, I have been modifying the dlt template to get rid of "un-needed" resources. Things such as cognito to use the api instead, the vpc creation piece to use our vpc creation process... etc. My template will be attached to this report. My tests with the API have worked before but now... for some reason I am getting the "Failed to run Fargate tasks." error. Looking at my logs, it seems it may be related to the subnets and that it isnt a string. The subnet looks like this in the body "subnetB":["subnet-0fa841e4e322dce63"]. Any idea why it's doing that?

To Reproduce

Expected behavior

Please complete the following information about the solution:

Screenshots

Link to cloudformation in repo https://github.com/khiliel/dlt-template Additional context

khiliel commented 2 months ago

![Uploading Screenshot 2024-04-11 at 2.41.59 PM.png…]()

I'm trying to upload the screenshot but not sure if it will let me

khiliel commented 2 months ago

Here is the error message from the step function if you cannot see the screenshot:

{
    "id": "15",
    "type": "TaskFailed",
    "details": {
        "cause": "{\"errorType\":\"MultipleValidationErrors\",\"errorMessage\":\"There were 2 validation errors:\\n* InvalidParameterType: Expected params.networkConfiguration.awsvpcConfiguration.subnets[0] to be a string\\n* InvalidParameterType: Expected params.networkConfiguration.awsvpcConfiguration.subnets[1] to be a string\",\"trace\":[\"MultipleValidationErrors: There were 2 validation errors:\",\"* InvalidParameterType: Expected params.networkConfiguration.awsvpcConfiguration.subnets[0] to be a string\",\"* InvalidParameterType: Expected params.networkConfiguration.awsvpcConfiguration.subnets[1] to be a string\",\"    at ParamValidator.validate (/var/task/node_modules/aws-sdk/lib/param_validator.js:40:28)\",\"    at Request.VALIDATE_PARAMETERS (/var/task/node_modules/aws-sdk/lib/event_listeners.js:166:42)\",\"    at Request.callListeners (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:106:20)\",\"    at callNextListener (/var/task/node_modules/aws-sdk/lib/sequential_executor.js:96:12)\",\"    at /var/task/node_modules/aws-sdk/lib/event_listeners.js:120:11\",\"    at finish (/var/task/node_modules/aws-sdk/lib/config.js:396:7)\",\"    at /var/task/node_modules/aws-sdk/lib/config.js:414:9\",\"    at EnvironmentCredentials.get (/var/task/node_modules/aws-sdk/lib/credentials.js:127:7)\",\"    at getAsyncCredentials (/var/task/node_modules/aws-sdk/lib/config.js:408:24)\",\"    at Config.getCredentials (/var/task/node_modules/aws-sdk/lib/config.js:428:9)\"]}",
        "error": "MultipleValidationErrors",
        "resource": "invoke",
        "resourceType": "lambda"
    },
    "previous_event_id": "14",
    "event_timestamp": "1712854623192",
    "execution_arn": "arn:aws:states:us-east-1:<myaccountid>:execution:DLTStepFunctionTaskRunnerStepFunctionsC295A535-qSj3TjCdBu2W:cb290512-2e8f-4701-a9ce-4068ebaaa552",
    "redrive_count": "0"
}
kamyarz-aws commented 2 months ago

https://github.com/aws-solutions/distributed-load-testing-on-aws/issues/183

Isnt this about the same thing?! If so I am going to close this ticket and assist you in the other one

khiliel commented 2 months ago

I asked a similar question in the thread of the other one, but the other ticket was a general question in regards to the custom resources in DLT. This one is for the errors I am getting. Sorry about that, I will remove the question from the other ticket

khiliel commented 2 months ago

Another thing I noticed while testing v3.2.7 is that I seem to get this error message as well: 52MsxcTsN8 live=false 00:10:21 ERROR: Config Error: Error when reading config file 'test.json': [Errno 2] No such file or directory: 'test.json'

This is with minor to no changes to the cf template. When I deploy the template and run tests, I do not utilize the admin name and email, since I removed that piece to exclude cognito. Not sure if that plays a factor into things. And for the subnet error, it seems to be coming from the resource DLTCustomResourcesTestingResourcesConfig0BCA657F when I removed the fn::if and replaced it to ref the existingSubnetA & B instead of the DLTVpcDLTSubnetAAE7DDEE8 subnet and CreateFargateVPCResources. I removed this because initially, I was planning to only allow teams to utilize an existing VPC and get rid of the vpc creation option, but as I removed things it started to see errors.

So far, it seems v3.2.4 is successful for me with minor changes. v3.2.7 seems to give me both of those errors consistently when I removed things, such as what I mentioned above.

kamyarz-aws commented 2 months ago

Can you please provide both templates changed with 3.2.4. and 3.2.7 so I could examine, create the issues on my end and see what we can do to resolve them

khiliel commented 2 months ago

Sure. Lastly, I've tried to use the v3.2.7 image as well with the v3.2.4 code and it seem to throw me some errors too, but some reason, 3.2.4 works fine. Also the cf templates are too big to attach to this thread. Is there an alternative way you would like me to send them over to you?

kamyarz-aws commented 2 months ago

The repo doesnt have 3.2.4 in it

khiliel commented 2 months ago

I just added it and 3.2.7, I invited you as a collaborator as well

khiliel commented 2 months ago

While you are looking into this, I have another question as well. Is there a way I can make the ContainerImage in the mappings section dynamic to the accountId that deploys it and the region they choose with psuedo code?

For ex: ContainerImage: ${AWS::AccountId}.dkr.ecr.${AWS::Region}.amazonaws.com/new-dlt-ecr-public/aws-solutions/distributed-load-testing-on-aws-load-tester:v3.2.4

I tried to just replace the reference in the task definition resource from the container image mapping to using the snippet I provided in the example, but I think it may have caused errors. I will try again to confirm

khiliel commented 2 months ago

Update on the last question. I figured it out, I did this to make the first half of the private container image reference dynamic

DLTEcsDLTTaskDefinition6BFC2400:
    Type: AWS::ECS::TaskDefinition
    Properties:
      ContainerDefinitions:
        - Essential: true
          Image:
            Fn::Join:
              - ""
              - - Ref: AWS::AccountId
                - ".dkr.ecr."
                - Ref: AWS::Region
                - ".amazonaws.com/"
                - Fn::FindInMap:
                  - Solution
                  - Config
                  - ContainerImage

Then, I just referenced the rest of the image uri in the containerImage mapping.

kaiz-io commented 1 month ago

EDIT: Thanks @kamyarz-aws, Looks like this issue was fixed in later version.

~Getting this error. It appeared to be missing some permissions.~

AccessDeniedException: User: arn:aws:sts::<removed>:assumed-role/<removed>-JMeter-Distributed--DLTLambdaFunctionDLTTestL-<removed>/<removed>-JMeter-Distributed--DLTLambdaFunctionTaskRun-<remove> is not authorized to perform: ecs:TagResource on resource: arn:aws:ecs:us-east-1:<removed>:task/LASOS-JMeter-Distributed-Load-Testing/* because no identity-based policy allows the ecs:TagResource action
kamyarz-aws commented 1 month ago

Getting this error. It appeared to be missing some permissions.

AccessDeniedException: User: arn:aws:sts::<removed>:assumed-role/<removed>-JMeter-Distributed--DLTLambdaFunctionDLTTestL-<removed>/<removed>-JMeter-Distributed--DLTLambdaFunctionTaskRun-<remove> is not authorized to perform: ecs:TagResource on resource: arn:aws:ecs:us-east-1:<removed>:task/LASOS-JMeter-Distributed-Load-Testing/* because no identity-based policy allows the ecs:TagResource action

That is unrelated to above issue. This particular issue has been resolved in DLT v3.2.6

khiliel commented 3 weeks ago

So I believe I have figured it out. The error that I've asked about before, was because of my changes to the DLTCustomResourcesTestingResourcesConfig0BCA657F custom resources Subnets. Initially, I had removed the if statement and had the subnets be the values for the existing ones. When doing so, it would pass those values to the task-runner lambda as an array/list. After doing some research I found that the task-runner Lambda function is expecting the subnets parameter to be a comma-separated string, but it's receiving a list or array instead. So I changes the subnets in that custom resource to this:

subnetA: !Join [',', [Ref: ExistingSubnetA]]
subnetB: !Join [',', [Ref: ExistingSubnetB]]

Now, it works as expected on v3.2.4. I'll try the latest version again soon and update you

kamyarz-aws commented 2 weeks ago

Great! glad that you could figure this out, it was very hard for me to pinpoint the issue. If you dont mind so I could undestand the problem we can at some point jump into a call and discuss what was the problem and how did you fix it. If it is okay please let me know so I could close the issue

khiliel commented 2 weeks ago

Sure, that sounds like a plan! How do I get in contact with you so we can set this up?

Also, this should be good to close.