containers / buildah

A tool that facilitates building OCI images.
https://buildah.io
Apache License 2.0
7.36k stars 780 forks source link

Building containers fail with error "error locating item named "manifest" for image with ID" #2727

Closed sshnaidm closed 3 years ago

sshnaidm commented 3 years ago

Description

In Openstack TripleO jobs we build a lot of containers, and sometimes it fails with error after a few tens of containers were built already:

error locating item named "manifest" for image with ID "....": file does not exist

For example:

'error checking if cached image exists from a previous build: error getting history of 
"68c139527f8459ba7c981a11f1872a6bb5b621c2140a6f2fa7858214df8e429f": error creating new image from reference to image
 "68c139527f8459ba7c981a11f1872a6bb5b621c2140a6f2fa7858214df8e429f": error locating item named "manifest" for image 
with ID "68c139527f8459ba7c981a11f1872a6bb5b621c2140a6f2fa7858214df8e429f": file does not exist\n'

Steps to reproduce the issue:

  1. It happens sporadically on CI jobs in Openstack TripleO, tracked in launchpad bug: "Building container fails because manifest is not found" Command to build can be found in log: Command: sudo buildah bud --volume /etc/yum.repos.d:/etc/yum.repos.d:z --volume /etc/pki/rpm-gpg:/etc/pki/rpm-gpg:z --volume /etc/yum.repos.d:/etc/yum.repos.d:z --volume /etc/pki/rpm-gpg:/etc/pki/rpm-gpg:z --format docker --tls-verify=False --layers --logfile /home/zuul/container-builds/7d364ba7-6c15-4676-98c7-813f36f44452/base/ovn-base/ovn-base-build.log -t 127.0.0.1:5001/tripleomaster/openstack-ovn-base:a9a790d0723c9fe6641e453c6a1f0c91 /home/zuul/container-builds/7d364ba7-6c15-4676-98c7-813f36f44452/base/ovn-base

Describe the results you received: After we build tens of containers it may fail, doesn't happen all the time. Seems like a race condition.

Describe the results you expected: To build all hundred containers.

Output of rpm -q buildah or apt list buildah:

buildah.x86_64                                1.11.6-7.module_el8.2.0+458+dab581ed           centos-appstreams  

*Output of `cat /etc/release`:**

NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Output of uname -a:

kernel-4.18.0-193.19.1.el8_2.x86_64

Output of cat /etc/containers/storage.conf:

driver = "overlay"
runroot = "/var/run/containers/storage"
graphroot = "/var/lib/containers/storage"
[storage.options]
additionalimagestores = [
]
size = ""
override_kernel_check = "true"
[storage.options.thinpool]
ostree_repo = ""
skip_mount_home = "false"
rhatdan commented 3 years ago

Any chance you can test with the latest version of Buildah that was just released in rhel8.3, to see if this issue still exists?

mwhahaha commented 3 years ago

I think this is a race condition with --layers because we will launch multiple buildah builds in parallel with --layers specified. From the error message, it seems like if 'if cached image exists from a previous build' fails, it should probably handle that it doesn't exist and build it.

bentito commented 3 years ago

Getting same error with podman build command that looks like this:

podman build -f Dockerfile -t quay.io/btofel/metering-hadoop:metering-rel-rel-3.3.0 .

and the specific error looks like:

Error: error checking if cached image exists from a previous build: error getting history of "d21c3bb5c18aa767fcac9a2ccd551da58e799bd3ceed4b097073ed030bb93c05": error creating new image from reference to image "d21c3bb5c18aa767fcac9a2ccd551da58e799bd3ceed4b097073ed030bb93c05": error locating item named "manifest" for image with ID "d21c3bb5c18aa767fcac9a2ccd551da58e799bd3ceed4b097073ed030bb93c05": file does not exist
bentito commented 3 years ago

And then that layer was still a problem, could not be removed during podman system reset, finally did podman system prune --all --force && podman rmi --all to get back to working state (and toss all the layers unfortunately)

rhatdan commented 3 years ago

@bentito @sshnaidm Are you still seeing this issue with buildah 1.19?

bentito commented 3 years ago

Sadly, I trashed that GCP VM I was doing that work on @rhatdan , I don't have an easy way to test.

sshnaidm commented 3 years ago

@bentito @sshnaidm Are you still seeing this issue with buildah 1.19?

We stopped to use layers, because it was failing a lot. Since then we stopped to track it.

rhatdan commented 3 years ago

Ok I will close for now, and we can reopen if it happens again.

rhatdan commented 3 years ago

@mrunalp @nalind Although this looks like what we were seeing in CRI-O also?

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

tetchel commented 3 years ago

I have this issue on buildah 1.21.1. I suspect it is because I sigterm'd buildah build while it was pulling a dependency image so it is in some broken intermediate state.

$ buildah bud --layers -f Containerfile --build-arg VCS_REF=$(git rev-parse HEAD | cut -c -7) -t openshift-actions-connector:latest .
STEP 1: FROM node:14-alpine AS builder
WARN[0000] error determining if an image is a manifest list: error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist, ignoring the error
WARN[0000] error determining if an image is a manifest list: error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist, ignoring the error
STEP 2: FROM node:14-alpine
error creating build container: error loading image manifest for "containers-storage:[overlay@/home/tim/.local/share/containers/storage+/run/user/1000/containers:overlay.mount_program=/usr/bin/fuse-overlayfs]@74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist
WARN[0000] error determining if an image is a manifest list: error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist, ignoring the error
WARN[0000] error determining if an image is a manifest list: error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist, ignoring the error
ERRO[0000] exit status 125
[ /src/redhat-actions/openshift-actions-connector ] 26 (main) $ podman image prune
WARNING! This will remove all dangling images.
Are you sure you want to continue? [y/N] y
Error: unable to get images to prune: error reading image "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda" as image: error locating item named "manifest" for image with ID "74e34178c6441e539fff82d163be37edb4bee1fec5f939ebf91d9ca4a26a9eda": file does not exist
schmidtd commented 2 years ago

I have this same problem - any idea how to remove the offending image? rmi -f doesn't succeed in getting rid of it.

rhatdan commented 2 years ago

Is this with a later version of Buildah? Have you tried with podman 4?

schmidtd commented 2 years ago

I updated everything in sight, and now I can rmi the offending images:

sudo yum -y install buildah
sudo yum -y install podman
sudo yum -y update conmon

$ buildah --version
buildah version 1.23.1 (image-spec 1.0.1-dev, runtime-spec 1.0.2-dev)
$ podman --version
podman version 3.4.2
$ conmon --version
conmon version 2.0.32
commit: 4b12bce835c3f8acc006a43620dd955a6a73bae0

This is on Red Hat Enterprise Linux release 8.3.

rhatdan commented 2 years ago

Great thanks.