awslabs / aws-service-catalog-puppet

This is a framework where you list your AWS accounts with tags and your AWS Service Catalog products with tags or target accounts. The framework works through your lists, dedupes and spots collisions and then provisions the products into your AWS accounts for you. It handles the Portfolio sharing, its acceptance and can provision products cross account and cross region.
Apache License 2.0
75 stars 41 forks source link

service catalog puppet 0.195.1 issue in running service catalog puppet single account run momory issues in service catalog puppetdeploy-in-spoke code pipeline #572

Closed pallanagaraju closed 2 years ago

pallanagaraju commented 2 years ago

Please include a link to your expanded manifest, the full contents of your AWS CodeBuild output (see https://aws-service-catalog-puppet.readthedocs.io/en/latest/puppet/using_the_cli.html#export-puppet-pipeline-logs)

Please ensure you are using the latest version and have run a validate command on your manifest file see (https://aws-service-catalog-puppet.readthedocs.io/en/latest/puppet/using_the_cli.html#validate)

Steps to reproduce

  1. after updating the service catalog puppet version to 0.195.1 the the service catalog puppet for single account run is failing at the POST_Build Stage. here is the high level error. POST_BUILD | Failed | COMMAND_EXECUTION_ERROR: Error while executing command: servicecatalog-puppet wait-for-parameterised-run-to-complete. Reason: exit status 1

  2. when i look at the "servicecatalog-puppet-deploy-in-spoke" code build it is failed with the memory issues. here is the errors.

[Container] 2022/10/20 09:42:13 Running command servicecatalog-puppet --info deploy-in-spoke-from-task-reference --execution-mode spoke --puppet-account-id $PUPPET_ACCOUNT_ID --single-account $(aws sts get-caller-identity --query Account --output text) --home-region $HOME_REGION --regions $REGIONS --should-collect-cloudformation-events $SHOULD_COLLECT_CLOUDFORMATION_EVENTS --should-forward-events-to-eventbridge $SHOULD_FORWARD_EVENTS_TO_EVENTBRIDGE --should-forward-failures-to-opscenter $SHOULD_FORWARD_FAILURES_TO_OPSCENTER ${PWD}

253 | Found existing SCT_CACHE_INVALIDATOR: 2022-10-20 09:32:48.890984 254 | INFO MainThread getting should_delete_rollback_complete_stacks, default_region: None 255 | INFO MainThread getting config, default_region: None 256 | INFO MainThread getting puppet_role_arn 257 | INFO MainThread getting partition 258 | INFO MainThread getting puppet_role_path 259 | INFO MainThread getting puppet_role_name 260 | INFO MainThread getting should_use_product_plans, default_region: eu-west-1 261 | INFO MainThread getting config, default_region: eu-west-1 262 | running in partition: aws as /servicecatalog-puppet/PuppetRole 263 | Traceback (most recent call last): 264 | File "/root/.pyenv/versions/3.7.13/bin/servicecatalog-puppet", line 8, in 265 | sys.exit(cli()) 266 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/click/core.py", line 764, in call 267 | return self.main(args, kwargs) 268 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/click/core.py", line 717, in main 269 | rv = self.invoke(ctx) 270 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/click/core.py", line 1137, in invoke 271 | return _process_result(sub_ctx.command.invoke(sub_ctx)) 272 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/click/core.py", line 956, in invoke 273 | return ctx.invoke(self.callback, ctx.params) 274 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/click/core.py", line 555, in invoke 275 | return callback(args, **kwargs) 276 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/servicecatalog_puppet/cli.py", line 653, in deploy_in_spoke_from_task_reference 277 | task_reference_commands.deploy_from_task_reference(p) 278 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/servicecatalog_puppet/commands/task_reference.py", line 1679, in deploy_from_task_reference 279 | reference = serialisation_utils.load_as_json(open(f, "r").read()) 280 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/servicecatalog_puppet/serialisation_utils.py", line 67, in load_as_json 281 | return serialisation_utils.json_loads(input) 282 | File "/root/.pyenv/versions/3.7.13/lib/python3.7/site-packages/servicecatalog_puppet/serialisation_utils.py", line 75, in json_loads 283 | return orjson.loads(s) 284 | orjson.JSONDecodeError: memory allocation failed: line 1 column 1 (char 0) 285 |   286 | [Container] 2022/10/20 09:42:18 Command did not exit successfully servicecatalog-puppet --info deploy-in-spoke-from-task-reference --execution-mode spoke --puppet-account-id $PUPPET_ACCOUNT_ID --single-account $(aws sts get-caller-identity --query Account --output text) --home-region $HOME_REGION --regions $REGIONS --should-collect-cloudformation-events $SHOULD_COLLECT_CLOUDFORMATION_EVENTS --should-forward-events-to-eventbridge $SHOULD_FORWARD_EVENTS_TO_EVENTBRIDGE --should-forward-failures-to-opscenter $SHOULD_FORWARD_FAILURES_TO_OPSCENTER ${PWD} exit status 1 287 | [Container] 2022/10/20 09:42:18 Phase complete: BUILD State: FAILED 288 | [Container] 2022/10/20 09:42:18 Phase context status code: COMMAND_EXECUTION_ERROR Message: Error while executing command: servicecatalog-puppet --info deploy-in-spoke-from-task-reference --execution-mode spoke --puppet-account-id $PUPPET_ACCOUNT_ID --single-account $(aws sts get-caller-identity --query Account --output text) --home-region $HOME_REGION --regions $REGIONS --should-collect-cloudformation-events $SHOULD_COLLECT_CLOUDFORMATION_EVENTS --should-forward-events-to-eventbridge $SHOULD_FORWARD_EVENTS_TO_EVENTBRIDGE --should-forward-failures-to-opscenter $SHOULD_FORWARD_FAILURES_TO_OPSCENTER ${PWD}. Reason: exit status 1 289 | [Container] 2022/10/20 09:42:18 Entering phase POST_BUILD 290 | [Container] 2022/10/20 09:42:18 Phase complete: POST_BUILD State: SUCCEEDED 291 | [Container] 2022/10/20 09:42:18 Phase context status code: Message: 292 | [Container] 2022/10/20 09:42:18 exiting execCommands 293 | [Container] 2022/10/20 09:42:18 Phase complete: UPLOAD_ARTIFACTS State: SUCCEEDED 294 | [Container] 2022/10/20 09:42:18 Phase context status code: Message:

  1. here i am attaching the service catalog single account run logs servicecatalog-puppet-single-account-run-codebuild-logs.txt

  2. servicecatalog-puppet-deploy logs servicecatalog-puppet-deploy.txt

  3. service servicecatalog-puppet-deploy-in-spoke-log.txt catalog-puppet-deploy-in-spoke logs

  4. manifest.yaml file manifest.zip

Expected results

Actual results

eamonnfaherty commented 2 years ago

Can you please confirm which AWS CodeBuild Environment Compute Type you are using and how many workers you have enabled.

It would be worth testing with 25 workers and the BUILD_GENERAL1_SMALL compute type to establish a baseline.

pallanagaraju commented 2 years ago

we are already using number workers are 30 and BUILD_GENERAL1_SMALL compute type. here is the image for puppet initialization parameters. pupet parameters additional paramters puppet

here is i have attached the puppet initialization stack parameters for your reference

eamonnfaherty commented 2 years ago

Could you share a copy of the utilisation graph please?

This version of puppet has a new workflow engine and may require a change to the number of workers. Could you rerun with 20 please?

pallanagaraju commented 2 years ago

hi Please find the resource utilization graphs resource utilization

pallanagaraju commented 2 years ago

Hi

I have tried configuring the Num workers to 20 still i am getting the memory error for "servicecatalog-puppet-deploy-in-spoke" code build job

eamonnfaherty commented 2 years ago

The hub creates a json encoded file containing a list of tasks that should be run in the spoke. This file was 725mb in this example. The AWS CodeBuild environment in the spoke account could not load a 725mb file due to memory restrictions.

In the release https://github.com/awslabs/aws-service-catalog-puppet/releases/tag/0.197.0 I have removed the unused reverse dependencies from the task manifest file.

In this example the 725mb file was reduced to 350mb.

I have tested loading a 350mb json file within the smallest CodeBuild environment type and it works so this should resolve the issue.