Closed junaruga closed 4 years ago
Haven't we addressed this recently? I feel like we have code to handle it being missing... @giuseppe
I think this is a new one, it should never fail if the pause.pid
file is missing.
Is there a temporary workflow to run podman info
on the environments? Such as mkdir -p /tmp/run-1000/libpod
before podman info
?
can you please share the output for env
?
HI Giuseppe, thank you for checking this issue. Sure. Okay.
In case of aarch64-fedora
case, here it is.
https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617709779#L330
$ env
TRAVIS_ARCH=aarch64
rvm_bin_path=/home/travis/.rvm/bin
MYSQL_UNIX_PORT=/var/run/mysqld/mysqld.sock
HAS_JOSH_K_SEAL_OF_APPROVAL=true
GEM_HOME=/home/travis/.rvm/gems/ruby-2.6.5
NVM_CD_FLAGS=
TRAVIS_TEST_RESULT=
SHELL=/bin/bash
TERM=xterm
PODMAN=podman
IRBRC=/home/travis/.rvm/rubies/ruby-2.6.5/.irbrc
TRAVIS_COMMIT=290d1f791d43a24dc36c96ca740c41bdf2dc5327
TRAVIS_OS_NAME=linux
TRAVIS_APT_PROXY=http://apt.cache.travis-ci.com
TRAVIS_JOB_NAME=aarch64-fedora
TRAVIS_INTERNAL_RUBY_REGEX=^ruby-(2\.[0-4]\.[0-9]|1\.9\.3)
OLDPWD=/home/travis/build
MY_RUBY_HOME=/home/travis/.rvm/rubies/ruby-2.6.5
TRAVIS_ROOT=/
TRAVIS_TIMER_ID=024812a0
ANSI_GREEN=\033[32;1m
NVM_DIR=/home/travis/.nvm
USER=travis
SUDO_USER=root
TRAVIS_LANGUAGE=shell
TRAVIS_INFRA=
SUDO_UID=0
ANSI_RESET=\033[0m
rvm_path=/home/travis/.rvm
TRAVIS_DIST=xenial
TRAVIS=true
TRAVIS_REPO_SLUG=junaruga/ci-multi-arch-native-container-test
ANSI_YELLOW=\033[33;1m
USERNAME=travis
TRAVIS_BUILD_STAGE_NAME=
TRAVIS_COMMIT_MESSAGE=Add commands to debug an environment. (#11)
TRAVIS_PULL_REQUEST=false
PAGER=cat
TRAVIS_CMD=env
TRAVIS_CPU_ARCH=arm64
rvm_prefix=/home/travis
PATH=/home/travis/bin:/home/travis/.local/bin:/home/travis/.rvm/gems/ruby-2.6.5/bin:/home/travis/.rvm/gems/ruby-2.6.5@global/bin:/home/travis/.rvm/rubies/ruby-2.6.5/bin:/home/travis/.phpenv/shims:/home/travis/.nvm/versions/node/v8.12.0/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/snap/bin:/home/travis/.phpenv/bin:/home/travis/.rvm/bin
TRAVIS_PULL_REQUEST_SHA=
TRAVIS_OSX_IMAGE=
TRAVIS_JOB_WEB_URL=https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617709779
TRAVIS_TMPDIR=/tmp/tmp.saDK01bWCN
TRAVIS_BUILD_WEB_URL=https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/builds/617709774
APT_KEY_DONT_WARN_ON_DANGEROUS_USAGE=1
PWD=/home/travis/build/junaruga/ci-multi-arch-native-container-test
CONTINUOUS_INTEGRATION=true
LANG=C.UTF-8
SETARCH=
TRAVIS_ENABLE_INFRA_DETECTION=true
TRAVIS_SUDO=true
TRAVIS_TAG=
TRAVIS_ALLOW_FAILURE=true
TRAVIS_HOME=/home/travis
TRAVIS_INIT=systemd
rvm_version=1.29.9 (latest)
TRAVIS_JOB_NUMBER=52.5
TRAVIS_EVENT_TYPE=push
SHLVL=1
PS4=+
SUDO_COMMAND=/bin/bash /home/travis/build.sh
HOME=/home/travis
ANSI_CLEAR=\033[0K
DIST=fedora
CI=true
TRAVIS_TIMER_START_TIME=1574859663084274950
BASE_IMAGE=fedora:31
TRAVIS_BUILD_ID=617709774
LOGNAME=travis
TRAVIS_PULL_REQUEST_SLUG=
GEM_PATH=/home/travis/.rvm/gems/ruby-2.6.5:/home/travis/.rvm/gems/ruby-2.6.5@global
XDG_DATA_DIRS=/usr/local/share:/usr/share:/var/lib/snapd/desktop
TRAVIS_SECURE_ENV_VARS=false
DEBIAN_FRONTEND=noninteractive
NVM_BIN=/home/travis/.nvm/versions/node/v8.12.0/bin
TRAVIS_APP_HOST=build.travis-ci.org
GIT_ASKPASS=echo
CC=
TRAVIS_BRANCH=master
SUDO_GID=0
TRAVIS_COMMIT_RANGE=6dae37c00dbf...290d1f791d43
TRAVIS_PULL_REQUEST_BRANCH=
TRAVIS_JOB_ID=617709779
ANSI_RED=\033[31;1m
RUBY_VERSION=ruby-2.6.5
container=lxc
TRAVIS_BUILD_NUMBER=52
TRAVIS_BUILD_DIR=/home/travis/build/junaruga/ci-multi-arch-native-container-test
_=/usr/bin/env
And in case of ppc64le
and s390x
,
@giuseppe Any progress?
it might be caused by differences in the kernel on that architectures. It could be the renameat2
syscall to behave differently.
The kernel is quite old though, and I don't think you can use fuse-overlayfs, so in general support for rootless containers is very limited.
Could you use a newer kernel?
Could you use a newer kernel?
Yes, I can use the newer kernel. But the Travis arm64/ppc64le/s390x's kernel is already newer than the Travis x86_64 environment's one.
The kernel versions (uname -r
) are
See this summary page for detail.
I also did put the log files of strace -f podman info
in "x86_64-fedora" and "aarch64-xenial-fedora" cases captured from here.
You can compare the files like this on your local.
$ vimdiff issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log
it might be caused by differences in the kernel on that architectures. It could be the renameat2 syscall to behave differently.
You can see that renameat2
is called in issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log
(= error case), but is not called in issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log
(= ok case).
I hope the logs might be a clue to fix this issue.
Thank you.
When I searched pause.pid
in both log files,
In issues/13_podman_arch/logs/x86_64-fedora/strace-podman-info.log
(ok case, user_id: 2000)
...
open("/run/user/2000/libpod/pause.pid", O_RDONLY) = 3
...
In issues/13_podman_arch/logs/aarch64-xenial-fedora/strace-podman-info.log
(error case, user_id: 1000)
...
[pid 2784] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...>
...
[pid 2784] renameat2(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", RENAME_NOREPLACE <unfinished ...>
...
[pid 2784] unlinkat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", 0 <unfinished ...>
...
[pid 2762] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", O_RDONLY|O_CLOEXEC <unfinished ...>
...
[pid 2762] write(2, "Error: could not get runtime: er"..., 123Error: could not get runtime: error setting up the process: open /tmp/run-1000/libpod/pause.pid: no such file or directory
...
thanks for such detailed info.
Yes, indeed the renameat2
syscall fails:
[pid 3123] <... renameat2 resumed> ) = -1 EINVAL (Invalid argument)
I'll prepare a patch and open a PR
opened a PR here: https://github.com/containers/libpod/pull/4637
You are welcome. Okay. I see the renameat2
syscall fails.
Out of curiosity, why is the process dealing with the pause.pid
file different between Travis x86_64 and aarch64(/ppc64le/s390x) environment?
In x86_64 open("/run/user/2000/libpod/pause.pid", O_RDONLY) = 3
is executed first.
In aarch64, openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...>
is executed first.
In aarch64,
openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid.BtSun0", O_RDWR|O_CREAT|O_EXCL, 0600 <unfinished ...>
is executed first.
I see the file is also first opened:
[pid 2762] openat(AT_FDCWD, "/tmp/run-1000/libpod/pause.pid", O_RDONLY|O_CLOEXEC <unfinished ...>
[pid 2763] pselect6(0, NULL, NULL, NULL, {0, 20000}, NULL <unfinished ...>
[pid 2762] <... openat resumed> ) = -1 ENOENT (No such file or directory)
I see.
I meant I wondered why renameat2
syscall was not executed in Travis x86_64 case (ok case).
Now seeing the file before your pull-request, https://github.com/containers/libpod/blob/10f733497f37c6ed85756ba95f6e75f3443a90af/pkg/rootless/rootless_linux.c#L27-L37
And I understand as the macro SYS_renameat2
was not defined in Travis x86_64, renameat2
syscall was not executed.
And you modified the logic to when SYS_renameat2
is defined, but the syscall execution is actually failed, run another rename logic rename (oldpath, newpath);
, right?
Great! Thank you. Please let me know after you will release the new version of the podman deb package, if you remember it. Then I would like to test it again.
I found the following article about Travis ppc64le and s390x environments, though I am not sure that it was directly related to this issue.
https://blog.travis-ci.com/2019-11-12-multi-cpu-architecture-ibm-power-ibm-z Syscall interception support - only system calls considered as safe. We will be working on overcoming these limitations in the coming months.
It seems that Ubuntu's podman installed from "ppa:projectatomic/ppa" is still 1.6.2 . It's not the latest version 1.7.0. And
$ apt list podman
WARNING: apt does not have a stable CLI interface. Use with caution in scripts.
Listing...
podman/xenial,now 1.6.2-1~ubuntu16.04~ppa1 arm64 [installed]
And the error still happens. https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/636119447#L438
Cross-distro packaging for Podman is now happening in the Open Build Service: https://build.opensuse.org/project/show/devel:kubic:libcontainers:stable
@lsm5 will soon give updates
Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)
/kind bug
Description
podman info
andpodman version
shows a following error in Travis Ubuntu xenial native CPU architecture:arm64
,ppc64le
ands390x
environments. This does not happen in the Travis Ubuntuamd64
(x86_64) environment. This issue is related to https://github.com/containers/libpod/issues/3679 .Steps to reproduce the issue:
aarch64-fedora
,ppc64le-fedora
ors390x-fedora
.Describe the results you received:
In case of
46.6: aarch64-fedora
, https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617259570#L126Describe the results you expected:
Additional information you deem important (e.g. issue happens only occasionally):
Output of
podman version
:podman --version
works to show the version.Output of
podman info --debug
:Package info (e.g. output of
rpm -q podman
orapt list podman
):See https://travis-ci.org/junaruga/ci-multi-arch-native-container-test/jobs/617303586#L323
Additional environment details (AWS, VirtualBox, physical, etc.):