(pipelines): Malformed YAML during stage deployment

straygar commented 3 years ago

Not 100% sure if this is a CDK, CodePipeline or CloudFormation bug.

It appears that CloudFormation bug (https://github.com/aws/aws-cdk/issues/11910) that has blocked our deployments for a few weeks was fixed! 🎉

However, trying to deploy the same stack to our first application stage now results in the following:

Template format error: YAML not well-formed. (line 5034, column 88) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: 6c5ea7db-4315-4f03-9b58-97af912c627a)

Which is odd - if I synthesize that stack locally, the YAML produced is only ~3000 lines long. Not really sure what I can do on my side to pinpoint the syntax error, if there is one.

Reproduction Steps

Add a few resources (a composite alarm, a few normal ones, some metrics) to a stack, that already has a lot of monitoring resources
Push the change, triggering the CDK pipeline

What did you expect to happen?

The stack to be deployed successfully in all application stages.

What actually happened?

The prepare step for the stack in the first stage fails with the error:

Template format error: YAML not well-formed. (line 5034, column 88) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: 6c5ea7db-4315-4f03-9b58-97af912c627a)

Environment

CDK CLI Version : 1.82.0
Framework Version: 1.82.0
Node.js Version: 14.13.0
OS : Ubuntu 20.04
Language (Version): TypeScript

Other

Region: eu-north-1
Pipeline execution ID: f1bed18e-5916-44ef-acf0-edb3215d8c74
CloudFormation request ID: 6c5ea7db-4315-4f03-9b58-97af912c627a

This is :bug: Bug Report

rix0rrr commented 3 years ago

If you go into the artifacts of the CodePipeline and you look at the YAML template there, what does it look like?

You will find them by going to the Synth CodeBuild project, finding the latest successful build and clicking build details:

Download that file, rename it to have a .zip extension, extract it, then look at the YAML file inside and see if that is malformed in any way.

NOTE: This may require you to extend permissions on the KMS Key that's used to encrypt the CodePipeline artifact bucket, because by default your IAM User/Role probably won't be allowed to read those CodePipeline artifacts otherwise. Add kms:Decrypt to the statement that has your account ID in it:

hoegertn commented 3 years ago

I assume this error means your template is bigger than 51200 bytes which is the max size in codepipeline.

You have to split your template

straygar commented 3 years ago

Thx for the detailed instructions, @rix0rrr!

@hoegertn That's strange - although my template is 156KB, I also have another 110KB one that has been deploying successfully. Does CodePipeline directly call *Stack CFN APIs with the template in the body, without using an S3 location? (which has a 1MB limit)

The line CFN is complaining about is the AssertDescription one in this block, which doesn't make a lot of sense...

"Rules": {
  "CheckBootstrapVersion": {
    "Assertions": [
      {
        "Assert": {
          "Fn::Not": [
            {
              "Fn::Contains": [
                [
                  "1",
                  "2",
                  "3"
                ],
                {
                  "Ref": "BootstrapVersion"
                }
              ]
            }
          ]
        },
        "AssertDescription": "CDK bootstrap stack version 4 required. Please run 'cdk bootstrap' with a recent version of the CDK CLI."
      }
    ]
  }
}

I can try splitting it up though. Are there any best practices of how to do this in CDK? I'm considering of moving the heavier bits into a NestedStack.

hoegertn commented 3 years ago

see https://docs.aws.amazon.com/codepipeline/latest/userguide/action-reference-CloudFormation.html under TemplatePath. CodePipeline uses the direct call instead of S3 apparently.

I hit this exact error some days ago. The error message looked familiar but is completely useless for this underlying error.

For splitting you do not need to use NestedStacks you can just create multiple "normal" stacks.

rix0rrr commented 3 years ago

This is pretty dang nasty.

Putting all of your infra in a single nested stack will solve it though. That way the "parent" stack is < 50kB and we defer for the actual template to our usual asset system.

rix0rrr commented 3 years ago

I also have another 110KB one that has been deploying successfully

Has this been deploying successfully via CDK Pipelines?

rix0rrr commented 3 years ago

The line CFN is complaining about is the AssertDescription one in this block, which doesn't make a lot of sense...
"Rules": {

Well... this is a somewhat non-standard CloudFormation feature (originally designed for Service Catalog). It's still weird that it would affect you here, since afaik we put this into all templates and it's normally fine.

But it could still totally be the rule we're synthesizing here?

straygar commented 3 years ago

OK, it appears that the size was the issue - at least extracting the bulkiest set of resources into a nested stack (just to have a slightly nicer hierarchy/resource dependencies in my case) did the trick.

Has this been deploying successfully via CDK Pipelines?

Yes. I have no idea how the 110KB stack deploys though, while a 156KB one doesn't...

rix0rrr commented 3 years ago

Hm. Maybe CloudFormation raised their service call limits from 50kB (old limit) to 100-something kB (new limit) ? And the docs just haven't caught up yet?

EDIT: Hmm not documented: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

straygar commented 3 years ago

Yup, that was my thought. Feel free to resolve this , if you think it's technically not a CDK issue, even though the CFN limit isn't clear.

But a more descriptive error would be amazing in this case! :) And if possible, using a TemplateURL for CloudFormation calls to allow for bigger templates (although I can see how this could be more complicated, due to S3 cross account permissions).

hoegertn commented 3 years ago

Putting all of your infra in a single nested stack will solve it though. That way the "parent" stack is < 50kB and we defer for the actual template to our usual asset system.

That is actually a great idea and use case for nested stacks. Currently, I try to avoid them.

hoegertn commented 3 years ago

OK, it appears that the size was the issue - at least extracting the bulkiest set of resources into a nested stack (just to have a slightly nicer hierarchy/resource dependencies in my case) did the trick.

Has this been deploying successfully via CDK Pipelines?

Yes. I have no idea how the 110KB stack deploys though, while a 156KB one doesn't...

Are you sure this stack is deployed using the CloudFOrmation actions and not using cdk deploy?

hoegertn commented 3 years ago

Hm. Maybe CloudFormation raised their service call limits from 50kB (old limit) to 100-something kB (new limit) ? And the docs just haven't caught up yet?

EDIT: Hmm not documented: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

I hit it some days ago with 70K so no raise here.

hoegertn commented 3 years ago

Yup, that was my thought. Feel free to resolve this , if you think it's technically not a CDK issue, even though the CFN limit isn't clear.

But a more descriptive error would be amazing in this case! :) And if possible, using a TemplateURL for CloudFormation calls to allow for bigger templates (although I can see how this could be more complicated, due to S3 cross account permissions).

I am in contact next week with the CodePipeline team to discuss this. This is ridiculous in my opinion.

rix0rrr commented 3 years ago

I've a feeling this might yet be something else than simply template size. Keeping this open so Thorsten has a place to share his findings next week.

malikalimoekhamedov commented 3 years ago

Experiencing the same validation error asking me to switch to version 4. Running cdk bootstrap doesn't make any changes.

straygar commented 3 years ago

Putting all of your infra in a single nested stack will solve it though. That way the "parent" stack is < 50kB and we defer for the actual template to our usual asset system.

That is actually a great idea and use case for nested stacks. Currently, I try to avoid them.

Yeah... the biggest downside I saw in this case is loss of useful snapshots for nested stacks (when changing the nested stack contents, the only snapshot change is the asset hash/location of the nested stack template).

Is there something in @aws-cdk/assert I can use to recursively cover nested stacks in my snapshots?

rix0rrr commented 3 years ago

Is there something in @aws-cdk/assert I can use to recursively cover nested stacks in my snapshots?

Not out of the box I don't think. You can build something though.

hoegertn commented 3 years ago

So the error could not be traced internally, but the limitation of the template size was indeed just a docs bug. So bigger templates are allowed.

Whenever somebody hits this issue again please ping me and we will dive into the generated output to track it down and give the CFN team some reproducible snippets.

Can this ticket be set to "need investigation" or sth like that?

bweigel commented 3 years ago

@hoegertn we have just hit this problem. We are deploying via CdkPipelines. The prepare step fails:

This is curious, since during prepare the pipeline just creates a CFN Changeset. I can download the artifact and create the changeset manually however:

As you can see the template is 56,1 kB. This is the location corresponding to the Error-Message:

Seems to me the issue might be lying in the underlying Codepipeline Action for Cloudformation. :thinking:

hoegertn commented 3 years ago

Can you send the account id and executions ids via dm? I will forward it to the respective teams for inspection.

bweigel commented 3 years ago

Hm. Maybe CloudFormation raised their service call limits from 50kB (old limit) to 100-something kB (new limit) ? And the docs just haven't caught up yet?

EDIT: Hmm not documented: https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cloudformation-limits.html

@rix0rrr Interestingly the documentation for the Cloudformation Action states:

hoegertn commented 3 years ago

Yeah, so they updated this. I am in contact with the team about this issue. It is not the size but sth weird.

straygar commented 3 years ago

Got a different flavour of a similar problem (failure in the prepare step:

Template format error: JSON not well-formed. (line 4212, column 4) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: b6dd7fb1-865a-4b65-aba6-a2d73393d70b; Proxy: null)

Region: eu-north-1
Pipeline execution ID: 66a7ef6b-6ae4-4f29-9331-4cb2ebb3a732
Template size: 137KB

This is getting super tough to maintain, if the template size is the issue. :/

rix0rrr commented 3 years ago

It's probably not. You should be able to look at the actual template if you download the artifacts from the CodePipeline S3 bucket, rename the file to .zip and extract it.

Does the file look broken?

straygar commented 3 years ago

@rix0rrr No, it seemed perfectly fine. :/ I was able to deploy the template manually from the CloudFormation console, deployment only failed in CodePipeline.

I worked around it by pulling some resources into a nested stack.

bweigel commented 3 years ago

@hoegertn did you hear back from the CFN team yet?

hoegertn commented 3 years ago

Unfortunately not. I sent them another occurrence too. In this case I found out it was a resource type that was not supported in that region.

So currently I suspect that this error means "something went wrong" but we have no idea what.

markokan commented 3 years ago

@hoegertn Hi, we have same problem when using AWS Pipelines. If I do directly from my computer using cdk deploy no problems occured and everything is fine.

Template format error: JSON not well-formed. (line 2454, column 1) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: c08894e6-4398-4441-b83f-c3b60ed5c4f1; Proxy: null)

Template info:

Size is 85kb
Lines 2454
Region: eu-west-1

I changed resources to minimize a template size. After change Codepipeline success if I decrease template size under 51kb. Solved this size issue (ValidationError) using nested stacks and keeping all of those templates size under 51kb.

Hopefully they are making better error messages or changing those limits much bigger.

moltar commented 2 years ago

Just go this too! So strange. The deployment was working yesterday. I did not upgrade any CDK packages. Made some infra changes, some app code changes. Deploying today and it is failing.

I have inspected the generated templates (grabbed them from S3) manually. They are all valid, within range.

Also, the error says:

Template format error: YAML not well-formed. (line 2284, column 4) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: 3fc510f4-0503-4d39-9971-d8f19ed9c0ae; Proxy: null)

But I don't even see any YAML files in my Synth_Outp/... file.

Contents of the ZIP file:

├── FrontendPipeline.assets.json
├── FrontendPipeline.template.json
├── assembly-FrontendPipeline-Frontend
│   ├── FrontendPipelineFrontendFrontendStack951D7130.assets.json
│   ├── FrontendPipelineFrontendFrontendStack951D7130.template.json
│   ├── cdk.out
│   └── manifest.json
├── asset.2cf0727c882dd14fb2df1545fb5ad36ba3cc86a64f8823b942905daca19cd0fd
│   └── index.js
├── asset.43d754c716b45e9599678d4dec96da6114f6044f144a2b94b75756585385a143
│   └── index.js
├── asset.4d82b0d25f135666fb5068e05d9f56a103e15648d95cb0f6d230d3fc2d0f9fdc
│   └── index.js
├── asset.76af4097099d6eef62e6dff3f16adf47daa22b09a5a9612d3eaa2b63d05fd994
│   └── index.js
├── asset.7f1160fe9dc8896ccdd85d44ba8e8e0614ecf88c78a9b8d1227266fb39316a62
│   └── index.js
├── asset.8232f53b1494e586db8f965674400246af9ebad94a92aacc2ab86d7165bcc29c
│   ├── __entrypoint__.js
│   ├── index.d.ts
│   └── index.js
├── asset.a3058ccb468d757ebb89df5363a1c20f5307c6911136f29d00e1a68c9b2aa7e8
│   └── index.py
├── asset.a47528ba5e1b5efdb389e51a980c36cfeb0cd96121009bf4ef69882a2200d5f2
│   ├── favicon.ico
│   ├── index.html

... snip lots of static files for the frontend

├── asset.abc0232a94d010f43c33184e6b575199069bb798f45e02e5d8e5ad68e6e7795a
│   └── index.js
├── asset.bebeb0424230094d4185e9f330b9945c12d82724ad54d8a8f0d81dbd230d445f
│   └── index.js
├── asset.ca4c2592d76ef33826699ebd76ebfd62873f6a5dbbef981eba4d8002ffe87197
│   └── index.js
├── asset.dfce510114402a49247ea4c54b9b4050271f83415d111f0dfbbac95505feea17
│   └── index.js
├── asset.e9882ab123687399f934da0d45effe675ecc8ce13b40cb946f3e1d6141fe8d68.zip
├── asset.ff8d00f9cedcafaf1ab299446ed2500b61b0b88ab40f066d05b26ff66bd58b15
│   ├── __entrypoint__.js
│   ├── diff.d.ts
│   ├── diff.js
│   ├── external.d.ts
│   ├── external.js
│   ├── index.d.ts
│   └── index.js
├── cdk.out
├── cross-region-stack-111111111111:ap-northeast-1.template.json
├── manifest.json
└── tree.json

43 directories, 309 files

aleskozina commented 2 years ago

Hi All

Got the same issue, posting here for reference:

Error message
Template format error: YAML not well-formed. (line 3182, column 115) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: 6d2feb3c-a51f-46a1-8e50-b7fd0fd1b4fa; Proxy: null)

Region: eu-west-1 Pipeline execution ID: 07c9278f-5bd0-46e0-b331-2ee462c41c72 Template size: ~95KB

ajolma commented 2 years ago

Working with the same repository as @markokan above and adding more resources. The best solution seems to put everything into a single nested stack. I don't understand why this repository is a problem as we have others which have a bigger main assembly template and they go through the last pipeline step fine. BTW, I confirmed that it is the size issue as when I minimized (removed whitespace) the main template json in the S3 zip, and rerun the last step it went through fine.

stekern commented 2 years ago

We are currently experiencing the same issue with one of our stacks. We made a change in our code that increased the size of the respective template from 77472 to 78841 bytes. We're able to deploy the stack locally ($ npx cdk deploy ...), but it fails on the Prepare step when trying to deploy it through CDK Pipelines.

I find it a bit surprising that the error message mentions YAML, while the TemplatePath is a JSON template. Is there some kind of preprocessing happening behind-the-scenes by CloudFormation? If so, that makes it even harder to debug wrt. the reported offending line and column number.

Full error message:

Template format error: YAML not well-formed. (line 2449, column 2) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: <redacted>; Proxy: null)

EDIT: The change was isolated to using one or more of the characters æøå in the EmailMessage property of the AWS::Cognito::UserPoolInviteMessageTemplate CloudFormation resource, even though this is valid according to the pattern. Using HTML-encoded characters instead of these characters mitigates the error, so it seems like the error from CodePipeline is masking something happening downstream similar to what @hoegertn is mentioning rather than occurring due to template size issue.

rtaylor-logitech commented 2 years ago

We're also experiencing the same issue as stekern. I hit this error if I use the λ character in cloudwatch dashboard labels. If I remove all uses of the λ character, the pipeline deploys my stacks correctly without error.

nodesmichael commented 2 years ago

We are also experiencing this issue:

We have ~30 inter-dependent stacks that we've been deploying successfully though a single cdk deploy call for the past couple of years and CloudFormation wasn't complaining for any of them.

We are now trying to do the same via a Pipeline, and all stacks over 51KB seem to be failing with the error mentioned above.

Actually the first stack that failed happened to contain email templates as @stekern and @rtaylor-logitech above. And the error was mentioning YAML. We replaced the templates with dummy templates to ensure no weird characters where included. Then we got the JSON-flavor of the error as the size of the template was still over 51KB. Reducing it to just under 51KB resulted in the stack's Prepare step to pass.

What is the suggested workaround again?

We would need for all the resources that are in our current stacks to continue having the same names. i.e. We do not afford for resources to be deleted/recreated as some of them contain data e.g. a Cognito UserPool.

Is it maybe that the CDK pipeline is trying to run something equivalent to aws cloudformation validate-template with --template-body instead of --template-url? Because according to the documentation the former has a limit of 51,200 bytes while the latter 460,800 bytes).

Surely if cdk deploy and CloudFormation can handle the size of a particular template CDK Pipelines should too, no?

v1pz3n commented 2 years ago

We are also experiencing this issue [2]

yukinakanaka commented 2 years ago

We are also experiencing this issue in CodePipeline.

Error message on CodePipeline:

Template format error: JSON not well-formed. (line 2261, column 4) (Service: AmazonCloudFormation; Status Code: 400; Error Code: ValidationError; Request ID: a0f14921-ad97-474e-9b3d-ffc4f3992189; Proxy: null)

Size of json file: 80kb Lines: 2262 Region: ap-northeast-1

If I do directly from my computer using cdk deploy, no problems occurred.

ghdoergeloh commented 2 years ago

I got the same error and I was also experiencing that my smallest template was working:

119.8 kB (not working)
116.2 kB (not working)
98.9 kB (working)

I first tried to understand the error by just doing what the pipeline tried. So I downloaded the Artifacts from the Build Action and took a look into the CloudFormation Action. I copied all of the configurations into a aws cli command and tried to execute it manually:

aws --region eu-west-1 cloudformation create-change-set \
--stack-name MyStack \
--change-set-name PipelineChange \
--template-body file://./cdk.out/assembly-PipelineStack-DevDeploy/PipelineStackDevDeployStackXXXXXXXX.template.json \
--role-arn arn:aws:iam::000000000000:role/cdk-hnb659fds-cfn-exec-role-000000000000-eu-west-1 \
--capabilities CAPABILITY_NAMED_IAM

This gave me the same error as the validate-template or the update-stack command:

An error occurred (ValidationError) when calling the CreateChangeSet|UpdateStack|ValidateTemplate operation: 1 validation error detected: Value '{<template>}' at 'templateBody' failed to satisfy constraint: Member must have length less than or equal to 51200

so I thought, okay, that should not be the problem because the artifact is stored inside a s3 bucket. So I uploaded the file to the assets s3 Bucket and executed it again:

aws --region eu-west-1 cloudformation create-change-set \
--stack-name MyStack \
--change-set-name PipelineChange \
--template-url https://cdk-hnb659fds-assets-000000000000-eu-west-1.s3.eu-west-1.amazonaws.com/test/PipelineStackDevDeployStackXXXXXXXX.template.json \
--role-arn arn:aws:iam::000000000000:role/cdk-hnb659fds-cfn-exec-role-000000000000-eu-west-1 \
--capabilities CAPABILITY_NAMED_IAM

This time it worked.

So the size seems to be the only realistic explanation.

So as far as I can see there are currently these possibilities to work around:

Using Nested Stacks (because they are in the asset s3 Bucket)
Use a BuildProject and call cdk deploy
Publish the template like all the other assets and execute the cloudformation command via cli

It would be great if AWS could improve the CloudFormation action in CodePipeline so it would upload the template first and then execute the change set creation or at least make this possible for cdk by accepting a s3 url as parameter.

AKoetsier commented 2 years ago

Experienced the same issue. I did some tests and it seems a combination of both size and special characters in the template.

A template of around 285KB deploys ok without any special characters. After adding unicode characters to an SES template it started failing with JSON (or sometimes YAML) not well-formed. Downloading the template from the synth output in S3 and deploying it directly with Cloudformation works.

To test a bit more I removed a lot of the resources from the project and just deployed the templates with unicode characters using the pipeline which worked correctly.

So from what I can make of it the issue seems to be triggered by a combination of a large template like @ghdoergeloh also mentioned, in combination with unicode characters (in my case in an SES template).

kashi238 commented 2 years ago

I am experiencing the same issue in our environment. If does not occur when the size of the template is up to 51KB, but it does occur when the size exeeed 57KB. Even if the error occurs, if I get generated template directly from s3 and run it a new stack, it works fine. It also seems to work if all Japanese characters are removed.

python3.8 aws-cdk 1.158.0
CDK-CLI 2.26.0
Node.js 16.15.1

miekassu commented 2 years ago

We are experiencing this issue after template size exceeded 57KB

nschwellnus commented 2 years ago

We were also experiencing the issue. In our case it was a combination of size and non-ascii characters. After increasing the size of the Stack the Prepare Step failed with "Template format error: JSON not well-formed". After removing non-ascii characters (ü in our case) the stack could be deployed via the pipeline normally. 24KB with non-ascii characters -> Working 98KB with non-ascii characters -> Not Working (JSON not well-formed Error) 98KB without non-ascii characters -> Working

MrtMonet commented 2 years ago

I'm currently experiencing the same issiue with one of my stacks. I'm getting the "Template format error: JSON not well-formed. (line 1883, column 3)" But when I validate from the cli the templates is fine. The file size is quite big though 85KB.

Thank you @nschwellnus , you tip did it! There was some non-asci character in my GraphQL schema!

SchollSimon commented 2 years ago

Experienced the same issue. I did some tests and it seems a combination of both size and special characters in the template.

A template of around 285KB deploys ok without any special characters. After adding unicode characters to an SES template it started failing with JSON (or sometimes YAML) not well-formed. Downloading the template from the synth output in S3 and deploying it directly with Cloudformation works.

To test a bit more I removed a lot of the resources from the project and just deployed the templates with unicode characters using the pipeline which worked correctly.

So from what I can make of it the issue seems to be triggered by a combination of a large template like @ghdoergeloh also mentioned, in combination with unicode characters (in my case in an SES template).

Picking up on that, i have a stack around ~ 100kb, only containing around 8 email templates. Deploying locally works fine, in pipeline it always prints Template format error: YAML not well-formed.

Therefore i checked if my email html templates contain non ascii characters and the answer is: yes.

So hope for a fix soon.

HeskethGD commented 1 year ago

Is anyone actively looking at this issue? It has blocked our deployment pipelines and has been really difficult to debug. The codepipeline build steps work but it fails when preparing in the deploy step despite the fact it can be deployed from command line ok. We are using cdk pipelines. After finding this thread I tried to search the cf template for non ascii characters and found that \ufeff is being inserted before the word schema in the appsync definition in the template and it is not yet clear if we put it there or aws processing did. I have tried to delete any character before the word schema in the file and redo the synth but it always appears. For what it's worth as some mentioned size above, the template is too big to be evaluated on file and has to be uploaded to s3 to run validation tests. Any idea how we may remove this character? If this wont be fixed we would need to add tests somehow to check for such characters when using cdk pipelines/codepipeline but this is made difficult because the code compiles and can be deployed by cdk deploy outside of cdk pipelines/codepipeline. This seems like a significant problem. Is there a work around for CI/CD in AWS with cdk that wouldn't experience this?

Update: after breaking up the stack into multiple stacks removing any infra not explicitly related to the AppSync API we were able to get it to deploy. But since then we have further developed our API adding more types, queries and mutations and seemingly hit the size limit and or character bug again. Not sure how we can further break up the API stack now so we are stuck. Any help would be appreciated it.

Update: We are currently working round this by putting the schema and api resource in one stack and all the resolvers in another stack and then importing the api resource by api_id so that the templates are kept small. Seems odd but it's working at least for now.

rix0rrr commented 1 year ago

I created an internal ticket to the CodePipeline team for this, but that's unfortunately the best we can do.

Internal reference: D68266996

Pharrox commented 1 year ago

May as well add my notes here.

We've been dealing with this for a few months, but I finally got a chance to dig into this today. The error always happens when trying to deploy email templates containing non-standard characters. As noted by everyone else, it works whenever CodePipeline isn't involved.

Was able to gather enough information to make some assumptions about the nature of the issue by poking at CloudTrail:

The error specifically happens during cloudformation:CreateChangeSet operations called by CodePipeline in the Prepare step.
When the template file is small enough, CodePipeline embeds the template in the CreateChangeSet call itself. Others have noted that for small enough files it works fine, this is probably why. None of our templates containing special characters are small enough to be deployed this way, but I can see the difference in the requests being made for templates without special characters that are small enough so it make sense with what others have observed.
When the template is too large to be included in the CreateChangeSet call, it is uploaded to S3 and CreateChangeSet is passed a pre-signed templateURL.
The S3 bucket where the template is uploaded is not owned by the account doing the deployment. It is not the CodePipeline artifacts bucket, nor is it any bucket related to CDK. For us-east-1, the bucket it uses is called cloudformation-codepipeline-largetemplate-bucket-useast1. This appears to be an internal AWS bucket used by CodePipeline. I couldn't find any mention or documentation of this bucket online.
The pre-signed URL has the format: https://${INTERNAL_BUCKET}.s3.amazonaws.com/${DATE}/${HOUR}/${ACCOUNT_ID}/${UUID}
My best guess is that something related to the upload to this bucket is the source of the problem. Possibly something in the bucket or built into the action pre-upload is set up to sanitize uploaded templates (given it seems to be a single bucket shared by all accounts deploying CloudFormation through CodePipeline). The sanitization is too aggressive and can mangle valid templates. This results in the JSON/YAML not well formed errors for templates that we know should be valid as they can be deployed by any other means.
We can see the pre-signed URL's for templates used to successfully create ChangeSets by pulling them out of CloudTrail (provided they aren't expired). Unfortunately, if a cloudformation:CreateChangeSet operation fails with the errors reported here then the requestParameters for the call are not captured by CloudTrail and there is no way to retrieve the template to view what, if anything, may have been changed to produce the error.

goranopacic commented 1 year ago

there is clearly a problem with non-ASCII characters. Here is what I did on Mac OS X:

Install pcre $ brew install pcre ...
search for non ascii characters $ pcregrep --color='auto' -n "[\x80-\xFF]" lib/* lib/schema.graphql:1:��schema {

I've found this at the beginning of schema.graphql file which was imported as asset in cdk. This same file was in use for some time and caused no issues till we added some new resources and size of the stack got bigger than 50kb. Now, after removing these characters stack size is bigger than 50kb and it is not a problem again. So, obviously, there is some problem with non-ASCII characters causing cf prepare step to fail validation. Equivalent linux command to check for non-ascii chars is: grep --color='auto' -P -n "[\x80-\xFF]" lib/*

HeskethGD commented 1 year ago

there is clearly a problem with non-ASCII characters. Here is what I did on Mac OS X:

Install pcre $ brew install pcre ...

search for non ascii characters $ pcregrep --color='auto' -n "[\x80-\xFF]" lib/* lib/schema.graphql:1:��schema {

I've found this at the beginning of schema.graphql file which was imported as asset in cdk. This same file was in use for some time and caused no issues till we added some new resources and size of the stack got bigger than 50kb. Now, after removing these characters stack size is bigger than 50kb and it is not a problem again. So, obviously, there is some problem with non-ASCII characters causing cf prepare step to fail validation. Equivalent linux command to check for non-ascii chars is: grep --color='auto' -P -n "[\x80-\xFF]" lib/*

This is the same as my issue above. How did you remove the character from the graphQL schema in the cloudformation template? In my case this was being put there when they were generated by CDK.

goranopacic commented 1 year ago

I recreated the schema file (simple copy/paste) and after build and synth cfn template was cleared of non-ascii characters. just to mention that cdk deploy worked fine all the time. cdk pipelines couldn't pass that CFN prepare phase.

aws / aws-cdk