Closed jiridanek closed 1 year ago
It happened again just now, https://github.com/jiridanek/cli-cpp/actions/runs/4894816601/jobs/8739559116#step:11:519
I am also seeing this issue appear intermittently. This is especially prevalent when attempting to do multiple builds at once that use ssh.
Getting this:
ERRO[0044] error serving agent: read unix /var/tmp/.buildah-ssh-sock583876480/ssh_auth_sock->@: use of closed network connection
I am doing npm installs in the image and here is the redacted output from npm
npm ERR! debug1: Will attempt key:******************************** agent
npm ERR! debug1: Will attempt key: /root/.ssh/id_rsa
npm ERR! debug1: Will attempt key: /root/.ssh/id_dsa
npm ERR! debug1: Will attempt key: /root/.ssh/id_ecdsa
npm ERR! debug1: Will attempt key: /root/.ssh/id_ecdsa_sk
npm ERR! debug1: Will attempt key: /root/.ssh/id_ed25519
npm ERR! debug1: Will attempt key: /root/.ssh/id_ed25519_sk
npm ERR! debug1: Will attempt key: /root/.ssh/id_xmss
npm ERR! debug1: SSH2_MSG_EXT_INFO received
npm ERR! debug1: kex_input_ext_info: server-sig-algs=<ssh-rsa,rsa-sha2-256,rsa-sha2-512>
npm ERR! debug1: SSH2_MSG_SERVICE_ACCEPT received
npm ERR! debug1: Authentications that can continue: password,publickey
npm ERR! debug1: Next authentication method: publickey
npm ERR! debug1: Offering public key: ******************************** agent
npm ERR! debug1: Server accepts key: ******************************** agent
npm ERR! sign_and_send_pubkey: signing failed for RSA "********************************" from agent: communication with agent failed
Sometimes the builds works to completion, sometimes not. Builds NEVER work when running more than one at a time.
It seems like a small retry mechanism ( with exponential backoff ) should help here.
I'll take this.
@jiridanek @alechirsch I'd suggest setting --retry
and --retry-delay
with the buildah build
command, adding more retry and increasing retry-delay
should help here. Could you please try and let me know if it helped.
This sounds like it should help, those retries. I'll set it and we'll see what happens. Currently the fails are already quite rare for me, GH infra seems to work mostly well ;)
Yes I mean you can just set these to ensure that these failures don't happen in future. I am closing this issue hoping that this should help but feel free to re-open
the issue if you hit this again.
@flouthoc I am running this with a podman-compose build
podman-compose --podman-build-args '--ssh default' build
I am not able to interface with buildah directly here. Seems like this should be built in, I am running into this issue on virtually every other build at random. This might be a separate issue and I can reopen a new one.
My issue is more similar to #3587, which this issue referenced. It seems like the fix for the previous issue was to increase the timeout before closing the connection. I do not think that works for my case. I am running an npm install
in my build, which contains multiple packages that needs to be pulled with the ssh connection. I suspect what is happening here is that npm gets to one of the packages requiring ssh, opens a connection, the connection gets closed by buildah, then later in the install process npm tries to get another package with ssh. It can not since the connection was already terminated.
Description
I saw the issue from https://github.com/containers/buildah/issues/3587 reappear once in a flaky manner, when building in GitHub Actions and using ghcr.io registry as the cache
https://github.com/jiridanek/cli-cpp/actions/runs/4893695440/jobs/8736952568#step:11:2967
The build command I am using is
Output of
rpm -q buildah
orapt list buildah
:I am using
quay.io/buildah/stable:latest
container image. It appears to have currentlyOutput of
buildah version
:Output of
podman version
if reporting apodman build
issue:*Output of `cat /etc/release`:**
Output of
uname -a
:This is from my laptop, not from hosted GitHub Actions runner
Output of
cat /etc/containers/storage.conf
: