Open BFlores16 opened 3 weeks ago
Hi @BFlores16 , thank you for reporting the issue.
It looks like to me that docker was running out of memory on the host that run sam build -u
. Could you please provide us the infrastructure detail of the host that run the job? (CPU, RAM, Disk space, ...).
You can also try splitting into smaller batches of lambdas and see whether it is still failing.
Hi @BFlores16 , thank you for reporting the issue.
It looks like to me that docker was running out of memory on the host that run
sam build -u
. Could you please provide us the infrastructure detail of the host that run the job? (CPU, RAM, Disk space, ...).You can also try splitting into smaller batches of lambdas and see whether it is still failing.
I'm not sure how I would split into smaller batches of lambdas, could you suggest how?
I am using the GitLab shared runners, which should have the following configs:
Here's some info I output in my pipeline.
$ echo "===== Memory Info ====="
===== Memory Info =====
$ free -h
total used free shared buff/cache available
Mem: 7.8Gi 569Mi 5.8Gi 1.0Mi 1.7Gi 7.2Gi
Swap: 2.0Gi 0B 2.0Gi
$ echo "===== Disk Space Info ====="
===== Disk Space Info =====
$ df -h
Filesystem Size Used Available Use% Mounted on
overlay 25.4G 7.7G 17.6G 31% /
tmpfs 64.0M 0 64.0M 0% /dev
shm 64.0M 0 64.0M 0% /dev/shm
/dev/sda1 25.4G 7.7G 17.6G 31% /builds
/dev/sda1 25.4G 7.7G 17.6G 31% /certs/client
/dev/sda1 25.4G 7.7G 17.6G 31% /etc/resolv.conf
/dev/sda1 25.4G 7.7G 17.6G 31% /etc/hostname
/dev/sda1 25.4G 7.7G 17.6G 31% /etc/hosts
/dev/sda1 25.4G 7.7G 17.6G 31% /var/lib/docker
tmpfs 3.9G 0 3.9G 0% /sys/devices/virtual/dmi/id
$ echo "===== Docker Info ====="
===== Docker Info =====
$ docker info
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Docker Buildx (Docker Inc.)
Version: v0.11.2
Path: /usr/local/libexec/docker/cli-plugins/docker-buildx
compose: Docker Compose (Docker Inc.)
Version: v2.22.0
Path: /usr/local/libexec/docker/cli-plugins/docker-compose
Server:
Containers: 0
Running: 0
Paused: 0
Stopped: 0
Images: 0
Server Version: 23.0.6
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Using metacopy: false
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 2
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc version: v1.1.7-0-g860f061
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: builtin
cgroupns
Kernel Version: 5.15.154+
Operating System: Alpine Linux v3.18 (containerized)
OSType: linux
Architecture: x86_64
CPUs: 2
Total Memory: 7.768GiB
Name: 08ef878f2db3
ID: a2b3703f-1391-4b0a-8cc4-4d59185b43eb
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Product License: Community Engine
Hi @BFlores16 , thank you for the info.
I assumed the memory was captured when the pipeline was not running? If it is true, is there a way that you can check the host's metrics to confirm there was no spike of memory or disk usage when the pipeline was running?
Also, look like you are running sam build
against Linux, so maybe there is no need to use a containerized build for this case. Another option I would suggest you to try is running sam build
without a -u
option to see whether it works?
In term of splitting the number of lambdas into smaller batches, you can try having multiple templates to deploy a smaller set of lambdas. Be note that this will result into a multiple Cloudformation stacks.
The memory was captured during the pipeline run but prior to the sam build command. I don't think there is a way for me to evaluate the memory usage during the sam build command.
Sam build without the -u does not work for me because many of my lambdas are being containerized due to requirements.txt files. I may be ignorant on how best to package my dependencies and deploy them in the pipeline so if you have any suggestions I would welcome them please. Here's an example error without -u flag
Build Failed Error: PythonPipBuilder:Validation - Binary validation failed for python, searched for python in following locations : ['/usr/bin/python', '/usr/bin/python3'] which did not satisfy constraints for runtime: python3.9. Do you have python for runtime: python3.9 on your PATH? Cleaning up project directory and file based variables 00:00 ERROR: Job failed: exit code 1
Hi @BFlores16 , thank you for the prompt response. I would like to dive deeper into the logs to see what could went wrong, could you please sent us the full output of sam build -u --debug
running on the pipeline?
Thanks!
Here is the log from a previous run with the --debug flag on
Thanks @BFlores16 for the logs.
Looks like I was able to reproduce the issue, this is likely because one of the container didn't return to the main thread after finishing the build. We are working on the fix and hopefully it could be available in the next release.
In the meantime, if you could install Python (with the same version as defined in template.yaml
) to the host and run sam build
without -u
, I believe it will help you unblock the pipeline.
I ended up changing all my python lambdas to a consistent version and installed python and node on my container. I also removed the -u flag. These were the added benefits:
I may check one day to see if the team ever actually fixes the bug reported but I doubt I would ever switch back to using the flag as this is quite convenient.
The only benefit I can see using the flag is that you wouldn't need to install your runtime versions explicitly and maintain your pipeline as much. That is probably what contributes to the increased run time though.
Here is how I modified my pipeline file:
variables:
SAM_TEMPLATE: Lambdas/template.yaml
DOCKER_DRIVER: overlay2
DOCKER_TLS_CERTDIR: "/certs"
services:
- docker:23.0.6-dind
image: docker:23.0.6
before_script:
- apk add --update python3 py3-pip python3-dev build-base libffi-dev util-linux procps
- apk add nodejs npm
- if ! python3 --version | grep -q "3.11"; then apk add --repository=http://dl-cdn.alpinelinux.org/alpine/edge/community python3=3.11*; fi
- ln -sf /usr/bin/python3 /usr/local/bin/python
- ln -sf /usr/bin/node /usr/local/bin/node
- pip install --upgrade pip
- pip install awscli aws-sam-cli
stages:
- preview
- deploy
preview:
stage: preview
timeout: 20m
script:
- chmod 755 aws-variables.sh
- ./aws-variables.sh
- export AWS_DEFAULT_REGION=$AWS_REGION
- cd Lambdas
- sam build
- sam deploy --region $AWS_REGION --no-execute-changeset --no-fail-on-empty-changeset
- cd ..
- changeset_id=$(aws cloudformation describe-change-set --stack-name Lambdas --change-set-name $(aws cloudformation list-change-sets --stack-name Lambdas --query "sort_by(Summaries, &CreationTime)[-1].ChangeSetName" --output text) --query "ChangeSetId" --output text)
- echo $changeset_id > changeset.txt
artifacts:
paths:
- changeset.txt
deploy-prod:
stage: deploy
script:
- chmod 755 aws-variables.sh
- ./aws-variables.sh
- changeset_id=$(cat changeset.txt)
- export AWS_DEFAULT_REGION=$AWS_REGION
- aws cloudformation execute-change-set --change-set-name $changeset_id
only:
- main
- develop
when: manual
environment:
name: production
I would still be interested to see if this bug is ever fixed as it makes it easier to package dependencies as I won't have to install and maintain specific versions
Hi @BFlores16 , thank you for the feedback.
Yes building without -u
would be much faster and easier, but sometimes it is troublesome to maintain the environments and dependencies in all the hosts.
The issue has been added to our backlog and we will try to find a solution for this.
Description:
I have a Gitlab pipeline that builds and deploys my SAM application. My application contains about 30 lambdas with mostly python and some node. I have never had an issue when I run/ /tmp/samcli/source:ro,delegated, inside runtime container"
sam build -u
locally. But when running the command in my pipeline, the pipeline hangs on the last function and gets stuck on "Mounting /builds/In order to resolve this, I have to delete all artifacts in my repo and then clear runner caches. Then the pipeline will work once with sam build, and then get stuck again on subsequent runs. I've tried modifying my gitlab-ci.yml in many ways with no success.
Here is my gitlab-ci.yml
Here is some of the end output from sam build -u --debug
Steps to reproduce:
sam build -u
Observed result:
Pipeline gets stuck on
sam build -u
Expected result:
Build should succeed and
sam deploy
should proceedAdditional environment details (Ex: Windows, Mac, Amazon Linux etc)
sam --version
: 1.123.0{ "version": "1.123.0", "system": { "python": "3.11.8", "os": "Linux-5.15.154+-x86_64-with" }, "additional_dependencies": { "docker_engine": "23.0.6", "aws_cdk": "Not available", "terraform": "Not available" }, "available_beta_feature_env_vars": [ "SAM_CLI_BETA_FEATURES", "SAM_CLI_BETA_BUILD_PERFORMANCE", "SAM_CLI_BETA_TERRAFORM_SUPPORT", "SAM_CLI_BETA_RUST_CARGO_LAMBDA" ] }
Add --debug flag to command you are running