aws / copilot-cli

The AWS Copilot CLI is a tool for developers to build, release and operate production ready containerized applications on AWS App Runner or Amazon ECS on AWS Fargate.
https://aws.github.io/copilot-cli/
Apache License 2.0
3.48k stars 400 forks source link

Out of Memory Error deploying static site #5550

Open czalkin opened 8 months ago

czalkin commented 8 months ago

When deploying a static site, i am getting an out of memory exception, followed by a number of other errors (attached)

I have narrowed it down to including the svg files from ionicon (part of ionic in our angular application). When I remove a number of these files, the error goes away. I haven't been able to pin it down to a specific set of files. All of the files are under 5kb. Total size for the entire site upload is under 4 MB.

Cloudwatch shows the following in /aws/lambda/$app-$env-TriggerStateMachineFunction

ERROR   caught error: Error: State machine failed: The state/task 'CopyFiles' returned a result with a size exceeding the maximum number of bytes service limit.

And this prints to the command line from copilot:

runtime: out of memory: cannot allocate 8388608-byte block (1775140864 in use)
fatal error: out of memory
runtime: out of memory: cannot allocate 8388608-byte block (1775140864 in use)
fatal error: out of memory

goroutine 486 [running]:
runtime.throw({0x246aa0c, 0xd})
    /root/.goenv/versions/1.21.1/src/runtime/panic.go:1077 +0x4d fp=0x69a17b4c sp=0x69a17b38 pc=0xd2ce7d
runtime.(*mcache).allocLarge(0x42e1e98, 0x500000, 0x1)
    /root/.goenv/versions/1.21.1/src/runtime/mcache.go:236 +0x1c5 fp=0x69a17b78 sp=0x69a17b4c pc=0xd07d75
runtime.mallocgc(0x500000, 0x1fe7b20, 0x1)
    /root/.goenv/versions/1.21.1/src/runtime/malloc.go:1123 +0x5bd fp=0x69a17bcc sp=0x69a17b78 pc=0xcff91d
runtime.makeslice(0x1fe7b20, 0x500000, 0x500000)
    /root/.goenv/versions/1.21.1/src/runtime/slice.go:103 +0x4d fp=0x69a17be0 sp=0x69a17bcc pc=0xd421bd
runtime.makeslice64(0x1fe7b20, 0x500000, 0x500000)
    /root/.goenv/versions/1.21.1/src/runtime/slice.go:117 +0x6d fp=0x69a17bf4 sp=0x69a17be0 pc=0xd4229d
github.com/aws/aws-sdk-go/service/s3/s3manager.(*maxSlicePool).newSlice(...)
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/pool.go:226
github.com/aws/aws-sdk-go/service/s3/s3manager.(*maxSlicePool).newSlice-fm()
    <autogenerated>:1 +0x67 fp=0x69a17c18 sp=0x69a17bf4 pc=0x1418987
github.com/aws/aws-sdk-go/service/s3/s3manager.(*maxSlicePool).Get(0xcf71840, {0x29a4200, 0x387b5e0})
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/pool.go:77 +0x33c fp=0x69a17ca0 sp=0x69a17c18 pc=0x1413a7c
github.com/aws/aws-sdk-go/service/s3/s3manager.(*returnCapacityPoolCloser).Get(0xcc02350, {0x29a4200, 0x387b5e0})
    <autogenerated>:1 +0x3b fp=0x69a17cbc sp=0x69a17ca0 pc=0x1417d1b
github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).nextReader(0xcc00960)
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/upload.go:499 +0x71 fp=0x69a17d30 sp=0x69a17cbc pc=0x1415c01
github.com/aws/aws-sdk-go/service/s3/s3manager.(*uploader).upload(0xcc00960)
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/upload.go:391 +0x1dc fp=0x69a17db4 sp=0x69a17d30 pc=0x141541c
github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader.UploadWithContext({0x500000, 0x5, 0x0, 0x2710, {0x29b499c, 0xd20cc38}, {0x0, 0x0, 0x0}, {0x299ab20, ...}, ...}, ...)
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/upload.go:307 +0x18e fp=0x69a17e38 sp=0x69a17db4 pc=0x1414d2e
github.com/aws/aws-sdk-go/service/s3/s3manager.Uploader.Upload(...)
    /go/pkg/mod/github.com/aws/aws-sdk-go@v1.47.3/service/s3/s3manager/upload.go:277
github.com/aws/aws-sdk-go/service/s3/s3manager.(*Uploader).Upload(0xcf71800, 0xe7028c0, {0x0, 0x0, 0x0})
    <autogenerated>:1 +0x92 fp=0x69a17ed0 sp=0x69a17e38 pc=0x1418592
github.com/aws/copilot-cli/internal/pkg/aws/s3.(*S3).upload(0xd13a250, {0xd141880, 0x3e}, {0xd1a7180, 0x4d}, {0x2997e98, 0xd0137d0})
    /codebuild/output/src1991231604/src/internal/pkg/aws/s3/s3.go:325 +0x167 fp=0x69a17f24 sp=0x69a17ed0 pc=0x141e877
github.com/aws/copilot-cli/internal/pkg/aws/s3.(*S3).Upload(0xd13a250, {0xd141880, 0x3e}, {0xd1a7180, 0x4d}, {0x2997e98, 0xd0137d0})
    /codebuild/output/src1991231604/src/internal/pkg/aws/s3/s3.go:78 +0x4f fp=0x69a17f54 sp=0x69a17f24 pc=0x141c2ff
github.com/aws/copilot-cli/internal/pkg/cli/deploy.NewStaticSiteDeployer.func1({0xd1a7180, 0x4d}, {0x2997e98, 0xd0137d0})
    /codebuild/output/src1991231604/src/internal/pkg/cli/deploy/static_site.go:74 +0x57 fp=0x69a17f84 sp=0x69a17f54 pc=0x1c5f817
github.com/aws/copilot-cli/internal/pkg/deploy/upload/asset.(*ArtifactBucketUploader).uploadAssets.func1()
    /codebuild/output/src1991231604/src/internal/pkg/deploy/upload/asset/asset.go:138 +0x42 fp=0x69a17fc0 sp=0x69a17f84 pc=0x1c500f2
golang.org/x/sync/errgroup.(*Group).Go.func1()
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:75 +0x5e fp=0x69a17ff0 sp=0x69a17fc0 pc=0x17a61ee
runtime.goexit()
    /root/.goenv/versions/1.21.1/src/runtime/asm_386.s:1363 +0x1 fp=0x69a17ff4 sp=0x69a17ff0 pc=0xd5d2a1
created by golang.org/x/sync/errgroup.(*Group).Go in goroutine 1
    /go/pkg/mod/golang.org/x/sync@v0.5.0/errgroup/errgroup.go:72 +0x97

Whole log attached.

err.txt

iamhopaul123 commented 8 months ago

Hello @czalkin.

When I remove a number of these files, the error goes away.

Do you mean you deleted these files temporarily or you updated the static site manifest to not include specific files? I wonder if you accidentally included too many files instead of "Total size for the entire site upload is under 4 MB.", since 1775140864 bytes is more than 1 GB 🤔 But either way the binary shouldn't panic, which indicates there's a bug for us to fix.

czalkin commented 8 months ago

I meant that I removed the files from the folder before attempting a new deploy. I also tried updating the manifest to ignore the folder, and it still generated the same errors. I would not expect copilot to even scan the folder when excluded in the manifest.

I double checked and, the images folder is under 4MB, the site is 5.5MB. It's a far cry from the 1775140864 bytes!

bash-5.2$ cd www
bash-5.2$ du -h .
4.0K    ./assets/icon
12K     ./assets/img
124K    ./assets/json
88K     ./assets/pics
68K     ./assets/svg
300K    ./assets
3.1M    ./svg
5.5M    .

bash-5.2$ cat ../copilot/myappname/manifest.yml
name: myappname
type: Static Site

http:
  path: '/'
  alias: 'myappname.mydomainname.com'

files:
  - source: ./www
    recursive: true
iamhopaul123 commented 8 months ago

Hello @czalkin. I can't reproduce the issue. Do you feel comfortable sending the compressed static assets folder to aws-copilot-feedback@amazon.com?

Cloudwatch shows the following in /aws/lambda/$app-$env-TriggerStateMachineFunction

How did you get this error if it failed even before deploying any CFN stack?

czalkin commented 8 months ago

I have sent a test case that reproduces the issue on my end. Thank you for looking into it.

I was able to run "svc init" and "svc deploy" with a mostly blank content. That created the CFN stack and initialized as expected. I added the rest of the site files, and the error returned and is visible in cloudwatch.

iamhopaul123 commented 8 months ago

Hello @czalkin. Thank you for sharing the test case. I am able to produce an error but it's not the same one as you got. I didn't get the out of memory panic.

Screenshot 2023-12-15 at 9 33 34 AM

From my understanding, the issue is because there is a file we use to store the mapping information for copying the cached file we uploaded earlier before the CFN stack deployment, to the real static site bucket. That mapping file size grows as the more total files you have and was above the result limit when the state machine performed s3:getObject. We might need to split the file into multiple smaller files when the size is above 256k.

As a temporary workaround before we fix the issue, could you have a relatively smaller number of total files managed by files in static site manifest by reducing the number or having some of them uploaded manually (we won't delete anything from the static site bucket until deleting the bucket). You can also use exclude to exclude some files. Sorry for the inconvenience and let me know if you have more questions!

czalkin commented 8 months ago

That does explain why removing the folder with 1338 files solved the issue and why I could not pin down an individual file that triggered the error. We are currently manually updating those files to S3, and we can keep down that path for the foreseeable future. Thank you for the quick investigation and workaround.