new error (used to work before) on podman run invocation: Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address

ppenguin commented 3 years ago

Description

Recently I've been getting the following error on a CI that used to run without problems. The CI runner is on bare metal (Manjaro) Linux, and may have been updated in the meantime, but I can't be sure this error is related to an update of my podman version (currently: 3.2.2).

ERRO[0002] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address

when executing ("call-graph-like" representation):

Makefile -> podman run -> script.sh -> exec go build ...

I figured it might be caused in some way by the podman log-driver, but using

podman run --log-driver=none ...

doesn't have any effect.

I'm at a loss what might be causing this or how to debug this issue...

Output of podman version:

podman version 3.2.2

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.21.0
  cgroupControllers: []
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: /usr/bin/conmon is owned by conmon 1:2.0.29-1
    path: /usr/bin/conmon
    version: 'conmon version 2.0.29, commit: 7e6de6678f6ed8a18661e1d5721b81ccee293b9b'
  cpus: 24
  distribution:
    distribution: manjaro
    version: unknown
  eventLogger: journald
  hostname: jmanji
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 107
      size: 1
    - container_id: 1
      host_id: 362144
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 107
      size: 1
    - container_id: 1
      host_id: 362144
      size: 65536
  kernel: 5.11.4-1-rt11-MANJARO
  linkmode: dynamic
  memFree: 14399438848
  memTotal: 67370762240
  ociRuntime:
    name: crun
    package: /usr/bin/crun is owned by crun 0.20.1-2
    path: /usr/bin/crun
    version: |-
      crun version 0.20.1
      commit: 38271d1c8d9641a2cdc70acfa3dcb6996d124b3d
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +YAJL
  os: linux
  remoteSocket:
    path: /run/user/107/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /etc/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: /usr/bin/slirp4netns is owned by slirp4netns 1.1.11-1
    version: |-
      slirp4netns version 1.1.11
      commit: 368e69ccc074628d17a9bb9a35b8f4b9f74db4c6
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.1
  swapFree: 30744055808
  swapTotal: 32133943296
  uptime: 549h 28m 41.2s (Approximately 22.88 days)
registries:
  1nnoserv:15000:
    Blocked: false
    Insecure: true
    Location: 1nnoserv:15000
    MirrorByDigestOnly: false
    Mirrors: []
    Prefix: 1nnoserv:15000
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  configFile: /home/gitlab-runner/.config/containers/storage.conf
  containerStore:
    number: 28
    paused: 0
    running: 0
    stopped: 28
  graphDriverName: overlay
  graphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: /usr/bin/fuse-overlayfs is owned by fuse-overlayfs 1.6-1
      Version: |-
        fusermount3 version: 3.10.4
        fuse-overlayfs: version 1.6
        FUSE library version 3.10.4
        using FUSE kernel interface version 7.31
  graphRoot: /home/gitlab-runner/.local/share/containers/storage
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  imageStore:
    number: 16
  runRoot: /run/user/107/containers
  volumePath: /home/gitlab-runner/.local/share/containers/storage/volumes
version:
  APIVersion: 3.2.2
  Built: 1625835244
  BuiltTime: Fri Jul  9 14:54:04 2021
  GitCommit: d577c44e359f9f8284b38cf984f939b3020badc3
  GoVersion: go1.16.5
  OsArch: linux/amd64
  Version: 3.2.2

vrothberg commented 3 years ago

Thanks for reaching out. Can you try with the latest Podman v3.2.3? There was a regression that has been fixed with .3.

ppenguin commented 3 years ago

Thanks for reaching out. Can you try with the latest Podman v3.2.3? There was a regression that has been fixed with .3.

Thanks, I just found https://github.com/containers/podman/issues/10863 after cloning the repo and seeing the RELEASE_NOTES... Must be it, sorry for the duplicate.

ppenguin commented 3 years ago

I appear to have spoken too soon. I quickly removed the Manjaro package and installed via nix version 3.1.2 (which happened to be the current version before updating my channels). Version 3.1.2 worked without issue. Then I installed 3.2.3 via (nix unstable channel) and it gives me the same error as 3.2.2.

% which podman
/nix/var/nix/profiles/default/bin/podman
% podman --version
podman version 3.2.3

ERRO[0002] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address

(BTW: the Manjaro package podman-git which provides 3.3.0-dev also had the same issue)

EDIT:

It has just gotten weirder: if I login to the gitlab runner and execute the failing make command manually multiple times, it sometimes works and sometimes doesn't (then it gives the same error). This is with podman-3.1.2.

cdoern commented 3 years ago

@vrothberg could this be due to my changes fixing #10868? I added some stuff for regular podman logs as well. I am actually working on a full implementation of the --until flag now

vrothberg commented 3 years ago

@vrothberg could this be due to my changes fixing #10868?

The commit from this PR wasn't backported to the v3.2 branch, so I don't think these changes are the problem.

@rhatdan, did we backport your systemd-detection fixes in c/common to the 0.38 branch?

ppenguin commented 3 years ago

@cdoern Additionally, I tried with Manaro's podman-git package which installs 3.3.0-dev. Eventually all versions I tried exhibited the same behaviour.

mheon commented 3 years ago

@vrothberg I recall seeing them in there when I was doing release notes, so they made it in.

@ppenguin Any chance you can get a podman info off the working version, 3.1.2? I want to see if any configuration changes happened between the releases, specifically to the event logger.

ppenguin commented 3 years ago

@mheon That's the crazy part: actually I found that there's no obvious difference in this behaviour between the versions 3.1.2, 3.2.3, master (the latter presumed from Manjaro podman-git), see my additional remark:

EDIT: It has just gotten weirder: if I login to the gitlab runner and execute the failing make command manually multiple times, it sometimes works and sometimes doesn't (then it gives the same error). This is with podman-3.1.2.

I isolated the issue further: it appears to happen only for my gitlab-runner user?!

Following test:

% export DOCKERCMD="$(which podman)"
export IMG="docker.io/bash"
for N in $(seq 1 10); do
${DOCKERCMD} run --rm ${IMG} echo "Haha ${N}" \
        && echo "Try ${N}: OK" || echo "Try ${N}: container error occurred, ignoring... (TODO: remove this workaround)"
done
Trying to pull docker.io/library/bash:latest...
Getting image source signatures
Copying blob ec83969a912d done
Copying blob 339de151aab4 done
Copying blob f0512d9ab85b done
Copying config d057f4d6e5 done
Writing manifest to image destination
Storing signatures
Haha 1
Try 1: OK
Haha 2
Try 2: OK
Haha 3
ERRO[0000] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address
Try 3: container error occurred, ignoring... (TODO: remove this workaround)
Haha 4
Try 4: OK
Haha 5
ERRO[0000] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address
Try 5: container error occurred, ignoring... (TODO: remove this workaround)
Haha 6
ERRO[0000] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address
Try 6: container error occurred, ignoring... (TODO: remove this workaround)
Haha 7
Try 7: OK
Haha 8
Try 8: OK
Haha 9
ERRO[0000] Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address
Try 9: container error occurred, ignoring... (TODO: remove this workaround)
Haha 10
Try 10: OK

As my main (desktop) user:

❯ export DOCKERCMD="$(which podman)"
export IMG="docker.io/bash"
for N in $(seq 1 10); do
${DOCKERCMD} run --rm ${IMG} echo "Haha ${N}" \
        && echo "Try ${N}: OK" || echo "Try ${N}: container error occurred, ignoring... (TODO: remove this workaround)"
done
Trying to pull docker.io/library/bash:latest...
Getting image source signatures
Copying blob ec83969a912d done
Copying blob 339de151aab4 done
Copying blob f0512d9ab85b done
Copying config d057f4d6e5 done
Writing manifest to image destination
Storing signatures
Haha 1
Try 1: OK
Haha 2
Try 2: OK
Haha 3
Try 3: OK
Haha 4
Try 4: OK
Haha 5
Try 5: OK
Haha 6
Try 6: OK
Haha 7
Try 7: OK
Haha 8
Try 8: OK
Haha 9
Try 9: OK
Haha 10
Try 10: OK

Both users are defined in subuid and subgid, and this issue only recently started occurring. I have not (yet) rebooted the system, which I figure might solve this issue, but that would destroy the unique test environment we have here, I suppose... (Since this looks like a bug in how a unique/rare constellation is handled?)

This happens for both versions 3.1.2 and 3.2.3.

mheon commented 3 years ago

Is your desktop user in wheel (or whatever group gives you sudo access)? Journald has different access restrictions for users in wheel vs not in wheel.

mheon commented 3 years ago

You may have to hardcode the use of the file events driver in containers.conf for the gitlab-runner user.

ppenguin commented 3 years ago

You may have to hardcode the use of the file events driver in containers.conf for the gitlab-runner user.

That would be doable I guess (could you give me a hint what that would look like or where to find documentation on that?). (BTW: can a user have an own ~/.local/share/containers/containers.conf or should the user somehow be referred to in the global conf?)

I tried adding gitlab-runner to wheel with no effect, but I can't be sure yet because long running tests are now executing under that user (so I couldn't completely log it out yet)...

Would you have any idea why this is only recently occurring?

rhatdan commented 3 years ago

cp /usr/share/containers/containers.conf /etc/containers/containers.conf
or
cp /usr/share/containers/containers.conf $HOME/.config/containers/containers.conf

$ grep events /usr/share/containers/containers.conf 
# Selects which logging mechanism to use for container engine events.
# events_logger = "journald"

Uncomment the events_logger line and change it to "file".

man containers.conf

github-actions[bot] commented 3 years ago

A friendly reminder that this issue had no activity for 30 days.

rhatdan commented 3 years ago

Since I gave a solution to this, and did not hear back. I am going to assume it worked. Reopen if I am mistaken.

Procsiab commented 2 years ago

Hello there, I experienced the same error but in a different situation (upgrading Fedora IoT from 34 to 35): suddenly, I got the same error that @ppenguin reported in the issue. However, creating the container.conf file under the home directory of the user I ran the containers with, and setting log_driver = "k8s-file" and de-commenting events_logger = "journald", I could then get the logs again with the podman logs command, without putting my unprivileged user into the group wheel. Notably, none of the other 3 combinations for the 2 options I mentioned above, lead to the unprivileged user reading the logs.

containers / podman

new error (used to work before) on podman run invocation: Cannot get exit code: failed to get journal cursor: failed to get cursor: cannot assign requested address #10987