containers / podman

Podman: A tool for managing OCI containers and pods.
https://podman.io
Apache License 2.0
23.45k stars 2.38k forks source link

Error: initial journal cursor: failed to get cursor: cannot assign requested address #12029

Closed cevich closed 2 years ago

cevich commented 2 years ago

Is this a BUG REPORT or FEATURE REQUEST? (leave only one on its own line)

/kind bug

Description

While testing new F35-beta VM images in https://github.com/containers/podman/pull/11795 multiple containerized-tests are failing with this error message. All of them involve the 'podman logs' command.

Steps to reproduce the issue:

  1. git checkout upstream/pr/11795
  2. hack/get_ci_vm.sh int remote fedora-35 root container
  3. podman run -it --rm --hostname test --privileged --net=host --cgroupns=host -v /dev/fuse:/dev/fuse -v "$GOPATH:$GOPATH:Z" --workdir "$GOSRC" $CTR_FQIN
  4. mkdir /tmp/test
  5. bin/podman --events-backend file create --name test --mount type=bind,src=/tmp/test,target=/myvol,z quay.io/libpod/alpine:latest touch /myvol/foo.txt
  6. bin/podman start test
  7. bin/podman logs test

Edit/Clarification: Reproducer steps 3-6 are done inside the container created in step 2.

Describe the results you received:

Error: initial journal cursor: failed to get cursor: cannot assign requested address

Describe the results you expected:

No output

Additional information you deem important (e.g. issue happens only occasionally):

Output of podman version:

Version:      4.0.0-dev
API Version:  4.0.0-dev
Go Version:   go1.16.8
Git Commit:   dd6de5b1f2f32c0b86b190886441d1ac7390d0c1
Built:        Mon Oct 18 20:46:19 2021
OS/Arch:      linux/amd64

Output of podman info --debug:

host:
  arch: amd64
  buildahVersion: 1.23.1
  cgroupControllers:
  - cpuset
  - cpu
  - io
  - memory
  - hugetlb
  - pids
  - misc
  cgroupManager: cgroupfs
  cgroupVersion: v2
  conmon:
    package: conmon-2.0.30-2.fc35.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.0.30, commit: '
  cpus: 2
  distribution:
    distribution: fedora
    variant: container
    version: "35"
  eventLogger: file
  hostname: test
  idMappings:
    gidmap: null
    uidmap: null
  kernel: 5.14.12-300.fc35.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 839319552
  memTotal: 4104306688
  ociRuntime:
    name: crun
    package: crun-1.2-1.fc35.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.2
      commit: 4f6c8e0583c679bfee6a899c05ac6b916022561b
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +YAJL
  os: linux
  remoteSocket:
    path: /run/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: false
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: false
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.1.12-2.fc35.x86_64
    version: |-
      slirp4netns version 1.1.12
      commit: 7a104a101aa3278a2152351a082a6df71f57c9a3
      libslirp: 4.6.1
      SLIRP_CONFIG_VERSION_MAX: 3
      libseccomp: 2.5.0
  swapFree: 4104122368
  swapTotal: 4104122368
  uptime: 25m 15.07s
plugins:
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /etc/containers/storage.conf
  containerStore:
    number: 1
    paused: 0
    running: 0
    stopped: 1
  graphDriverName: vfs
  graphOptions: {}
  graphRoot: /var/lib/containers/storage
  graphStatus: {}
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 1
  runRoot: /run/containers/storage
  volumePath: /var/lib/containers/storage/volumes
version:
  APIVersion: 4.0.0-dev
  Built: 1634589979
  BuiltTime: Mon Oct 18 20:46:19 2021
  GitCommit: dd6de5b1f2f32c0b86b190886441d1ac7390d0c1
  GoVersion: go1.16.8
  OsArch: linux/amd64
  Version: 4.0.0-dev

Package info (e.g. output of rpm -q podman or apt list podman):

Compiled from source

Have you tested with the latest version of Podman and have you checked the Podman Troubleshooting Guide? (https://github.com/containers/podman/blob/master/troubleshooting.md)

Yes

Additional environment details (AWS, VirtualBox, physical, etc.):

GCP VM Image fedora-c5160336597712896

mheon commented 2 years ago

--events-backend file + Journald log driver - I thought this configuration was unsupported? @vrothberg Do you know?

cevich commented 2 years ago

I thought this configuration was unsupported

Updated description: Steps 3-6 are done inside the container.

mheon commented 2 years ago

It's still autodetecting the journald log driver, even inside the container, which also seems like a potential cause.

cevich commented 2 years ago

Ahh, okay, that would make sense then. Curious why/how this didn't break on F34 though...does the autodetect requires a certain systemd/journald version?

vrothberg commented 2 years ago

--events-backend file + Journald log driver - I thought this configuration was unsupported? @vrothberg Do you know?

It's are runtime check in podman logs: https://github.com/containers/podman/blob/main/libpod/container_log_linux.go#L111

We don't reject this configuration at creation time.

cevich commented 2 years ago

What's needed to fix this then? It's affecting a great number of tests in https://github.com/containers/podman/pull/11795 (update CI to F35)

vrothberg commented 2 years ago

The root cause of the Error: initial journal cursor: failed to get cursor: cannot assign requested address error remains unclear to me at the moment.

vrothberg commented 2 years ago

hack/get_ci_vm.sh int remote fedora-35 root host

The image doesn´t exist when I try to reproduce.

cevich commented 2 years ago

The image doesn´t exist when I try to reproduce.

The images are changing almost daily, make sure you pull down the latest version of #11795

cevich commented 2 years ago

Woopsie-doodle...I pasted in the wrong test name, that's why it can't resolve $CTR_FQIN. I edited the description, it should be:

hack/get_ci_vm.sh int remote fedora-35 root container (not host)

edsantiago commented 2 years ago

I can't reproduce this on a 1minutetip f35 box. I can't figure out what's different between it and our CI environment.

cevich commented 2 years ago

That's odd, you ran it from inside a container, ya?

edsantiago commented 2 years ago

Ohhhh.... no, I missed that part. Thank you!

edsantiago commented 2 years ago

Okay, this is trivial to reproduce:

# bin/podman run -it  --rm --privileged --net=host --cgroupns=host  \
    -v /var/tmp/tmp.vgOXZue96r:/tmp:Z
    -v /dev/fuse:/dev/fuse \
    -v "$GOPATH:$GOPATH:Z" \
    -v /root/podman:/root/podman:Z \
    --workdir "/root/podman" \
    -e CONTAINER=1 \
    quay.io/libpod/fedora_podman:c5920205063847936 bash
[hash prompts below are in the container. Is there a way to tell github console formatting to allow a prefix before hash?]

# bin/podman --storage-driver vfs run -d --name foo quay.io/libpod/testimage:20210610 date
Trying to pull quay.io/libpod/testimage:20210610...
Getting image source signatures
Copying blob 9afcdfe780b4 done
Copying config 9f9ec7f2fd done
Writing manifest to image destination
Storing signatures
0dfa6f0753bd37fe6b41879df46f1a119ce1d7660b65bd8437db72ef6a04cc8a
# bin/podman --storage-driver vfs logs foo
Error: initial journal cursor: failed to get cursor: cannot assign requested address

One solution: when running podman in container, create /etc/containers/container.conf with:

[containers]
log_driver = "k8s-file"

E.g. by creating that file and bind-mounting it, or by creating it in setup.sh if $CONTAINER == 1.

There are probably many other ways to force file logging. (And there might even be a better alternative, e.g. making journalctl work in the container somehow?). But I'm done for the day.

nalind commented 2 years ago

Thanks @edsantiago, that clears up a lot. journalctl goes and looks directly at journal files, and so does the API we use to read the journal. When the files, and the directories that contain them, are absent, the initialization check at https://github.com/containers/common/blob/2dd5d30196f0dc7e15d1e2d53965c91821fd7e10/pkg/config/systemd.go#L70 succeeds anyway, since it skips over content that is missing or not readable due to permissions.

We probably need to improve that logic. I don't think it's reasonable to expect that people will bind mount both the journal files and the sockets that the host's systemd-journald is listening on into a container, and we know our test cases, at least, don't run their own systemd-journald in the test container.

cevich commented 2 years ago

Yes, thanks a lot @edsantiago this one has had us stumped for a while. I'm a little hesitant to workaround the problem as you suggest (but thanks for the ideas) simply because I don't want to hide this/related problems in the future. Though I suppose I could but in a temporary band-aid if I must.

@nalind do you think it will take weeks/months to fix the logic? Do you, or who knows this code? I'm trying to decide what to do since this is blocking us from using F35.

nalind commented 2 years ago

Hmm, actually, that shouldn't be the problem, though I can submit a PR to check that there's at least one entry in the journal as a last-ditch thing. All of that defaults logic gets short-circuited if we look in /proc/1/comm and don't find "systemd", so it shouldn't even be getting that far. Are we still explicitly setting the log driver to "journald" in /usr/share/containers/containers.conf?

cevich commented 2 years ago

I just checked in quay.io/libpod/fedora_podman:c5920205063847936 (from my F35 PR). The file from containers-common-1-32.fc35.noarch is attached.

containers.conf.txt

Answer: Yes, it's set to journald

nalind commented 2 years ago

I think we want to stop forcing that, then.

cevich commented 2 years ago

Should the line just be commented out, or default to something explicitly?

nalind commented 2 years ago

It should be left commented out, so that instead of simply accepting the configured value, the library will select the most appropriate setting to use based on a quick examination of the environment it finds itself in.

cevich commented 2 years ago

Ugh :vomiting_face: looks like upstream does have this commented out (code). So that begs the question: How/why is it not commented out in the packages we install in VMs?

cevich commented 2 years ago

I'll send a note to the team, I think this might be beyond a "We need a new RPM" question...

edsantiago commented 2 years ago

https://src.fedoraproject.org/rpms/containers-common/blob/f35/f/containers.conf#_152-154

cevich commented 2 years ago

Bingo, thanks @edsantiago. I sent a note to the team, assuming it's not just a simple fix (maybe there's a reason it's set that way).

I'll add a temporary workaround for this in our image-build, so we don't have to wait weeks for RPMs to roll (if that's what's needed).

cevich commented 2 years ago

Update: I've moved the workaround into the podman CI scripts, since that will be easier to undo later. Ref: https://github.com/containers/podman/pull/11795/commits

godmar commented 2 years ago

I'm using podman on a native system where I don't have root. Our sysadmin set up podman for us to use in rootless mode, using /var/lib/user/$USER as rootless storage path. I can build containers (and start them), but any attempt to look at the logs of a detached container fails with:

Error: initial journal cursor: failed to get cursor: cannot assign requested address

This is podman version 3.4.1-dev on CentOS Stream release 8 /etc/containers/container.conf does not exist. grep -rl log_driver /etc/containers comes up empty.

What should I tell our sysadmin to change so we can see our containers' log output?

vrothberg commented 2 years ago

@godmar, you can create following file at /etc/containers/containers.conf:

[engine]
log_driver="k8s-file"

Or edit /usr/share/containers/containers.conf (or copy that over to /etc/ and edit it there).

edsantiago commented 2 years ago

Closing because this issue was specific to #11795, which has merged. (In particular, this was specific to containerized environments). @godmar if @vrothberg's suggestion doesn't resolve your needs, could you please open a new issue?

godmar commented 2 years ago

@godmar, you can create following file at /etc/containers/containers.conf:

[engine]
log_driver="k8s-file"

Or edit /usr/share/containers/containers.conf (or copy that over to /etc/ and edit it there).

Thanks, setting log_driver="k8s-file" in /usr/share/containers/containers.conf resolved the issue, with the following caveats:

edsantiago commented 2 years ago

Uh-oh. Seen just now in f35 root:

[+0993s] [not ok 226 podman pod top - containers in different PID namespaces]()
         # (from function `die' in file test/system/[helpers.bash, line 500](https://github.com/containers/podman/blob/88d4db009eea8ab49788ecc6625e7117f0c96165/test/system/helpers.bash#L500),
         #  from function `run_podman' in file test/system/[helpers.bash, line 219](https://github.com/containers/podman/blob/88d4db009eea8ab49788ecc6625e7117f0c96165/test/system/helpers.bash#L219),
         #  from function `wait_for_output' in file test/system/[helpers.bash, line 254](https://github.com/containers/podman/blob/88d4db009eea8ab49788ecc6625e7117f0c96165/test/system/helpers.bash#L254),
         #  in test file test/system/[200-pod.bats, line 39](https://github.com/containers/podman/blob/88d4db009eea8ab49788ecc6625e7117f0c96165/test/system/200-pod.bats#L39))
         #   `wait_for_output "PID \+PPID \+USER " $cid2' failed with status 125
         # # podman rm -t 0 --all --force --ignore
         # # podman ps --all --external --format {{.ID}} {{.Names}}
         # # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
         # quay.io/libpod/testimage:20210610 9f9ec7f2fdef
         # # podman pod create --infra=false
         # f62ebb46dd3bfa973229a72696554424961e796d94b757b9f58fabb9dcf16ec9
         # # podman run -d --pod f62ebb46dd3bfa973229a72696554424961e796d94b757b9f58fabb9dcf16ec9 quay.io/libpod/testimage:20210610 top -d 2
         # 92ea1e2da9dce47462289f98e99494661813db3884090a74763bf8b36d389f28
         # # podman run -d --pod f62ebb46dd3bfa973229a72696554424961e796d94b757b9f58fabb9dcf16ec9 quay.io/libpod/testimage:20210610 top -d 2
         # ab38eb9793f9ef02b891a33f5877014f195cc663b7caf9b723d8fc4eb263f4ba
         # # podman logs 92ea1e2da9dce47462289f98e99494661813db3884090a74763bf8b36d389f28
         # Mem: 1353336K used, 2654240K free, 924K shrd, 24K buff, 993888K cached
         # CPU:   5% usr   0% sys   0% nic  90% idle   0% io   5% irq   0% sirq
         # Load average: 1.90 2.17 1.51 3/271 1
         #   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
         # # podman logs ab38eb9793f9ef02b891a33f5877014f195cc663b7caf9b723d8fc4eb263f4ba
         # Error: initial journal cursor: failed to get cursor: cannot assign requested address
         # [ rc=125 (** EXPECTED 0 **) ]
edsantiago commented 2 years ago

And again in f35 root netavark, same test:

[+0957s] [not ok 227 podman pod top - containers in different PID namespaces]()
         # (from function `die' in file test/system/[helpers.bash, line 500](https://github.com/containers/podman/blob/164b64ea3baa8502a23fc0c7674f4a7e60507aa0/test/system/helpers.bash#L500),
         #  from function `run_podman' in file test/system/[helpers.bash, line 219](https://github.com/containers/podman/blob/164b64ea3baa8502a23fc0c7674f4a7e60507aa0/test/system/helpers.bash#L219),
         #  from function `wait_for_output' in file test/system/[helpers.bash, line 254](https://github.com/containers/podman/blob/164b64ea3baa8502a23fc0c7674f4a7e60507aa0/test/system/helpers.bash#L254),
         #  in test file test/system/[200-pod.bats, line 39](https://github.com/containers/podman/blob/164b64ea3baa8502a23fc0c7674f4a7e60507aa0/test/system/200-pod.bats#L39))
         #   `wait_for_output "PID \+PPID \+USER " $cid2' failed with status 125
         # # podman rm -t 0 --all --force --ignore
         # # podman ps --all --external --format {{.ID}} {{.Names}}
         # # podman images --all --format {{.Repository}}:{{.Tag}} {{.ID}}
         # quay.io/libpod/testimage:20210610 9f9ec7f2fdef
         # # podman pod create --infra=false
         # 697a51d00454c22ab7f01863a4d6f1c2b631715c9ccf27a21b19eaccaf538ac0
         # # podman run -d --pod 697a51d00454c22ab7f01863a4d6f1c2b631715c9ccf27a21b19eaccaf538ac0 quay.io/libpod/testimage:20210610 top -d 2
         # b2df58fdfe04b4d08359d4892a2e6458a6064dc10f6d3e39f83e13f6cb7ef212
         # # podman run -d --pod 697a51d00454c22ab7f01863a4d6f1c2b631715c9ccf27a21b19eaccaf538ac0 quay.io/libpod/testimage:20210610 top -d 2
         # 528283e9b07c7420643e2fd9bfe03088b4eb4202c7c0fccd5380fd8f45796c58
         # # podman logs b2df58fdfe04b4d08359d4892a2e6458a6064dc10f6d3e39f83e13f6cb7ef212
         # Mem: 1348336K used, 2659248K free, 972K shrd, 24K buff, 994456K cached
         # CPU:  14% usr   9% sys   0% nic  71% idle   4% io   0% irq   0% sirq
         # Load average: 1.26 2.11 1.51 5/251 1
         #   PID  PPID USER     STAT   VSZ %VSZ CPU %CPU COMMAND
         # # podman logs 528283e9b07c7420643e2fd9bfe03088b4eb4202c7c0fccd5380fd8f45796c58
         # Error: initial journal cursor: failed to get cursor: cannot assign requested address
         # [ rc=125 (** EXPECTED 0 **) ]
github-actions[bot] commented 2 years ago

A friendly reminder that this issue had no activity for 30 days.

categulario commented 2 years ago

I just experienced this issue on ubuntu 22.04 with podman 3.4.4.

log_driver="k8s-file" works fine to see the logs but --follow doesn't seem to work

DevDorrejo commented 2 years ago

Hello, have this issue too.

machine:

$HOME/.config/containers/containers.conf

log_driver = "k8s-file"
events_logger = "file"

Podman Command:

podman run --label io.containers.autoupdate=registry --detach \
  --publish 8443:443 --publish 8080:80 --publish 2222:22 \
  --name gitlab \
  --restart always \
  --volume $GITLAB_HOME/config:/etc/gitlab \
  --volume $GITLAB_HOME/logs:/var/log/gitlab \
  --volume $GITLAB_HOME/data:/var/opt/gitlab \
  --shm-size 256m \
gitlab/gitlab-ee:latest

podman logs gitlab Error: initial journal cursor: failed to get cursor: cannot assign requested address

DevDorrejo commented 2 years ago

Hello, have this issue too.

machine:

  • podman 3.4.4
  • openSUSE Leap 15.4

$HOME/.config/containers/containers.conf

log_driver = "k8s-file"
events_logger = "file"

Podman Command:

podman run --label io.containers.autoupdate=registry --detach \
  --publish 8443:443 --publish 8080:80 --publish 2222:22 \
  --name gitlab \
  --restart always \
  --volume $GITLAB_HOME/config:/etc/gitlab \
  --volume $GITLAB_HOME/logs:/var/log/gitlab \
  --volume $GITLAB_HOME/data:/var/opt/gitlab \
  --shm-size 256m \
gitlab/gitlab-ee:latest

podman logs gitlab Error: initial journal cursor: failed to get cursor: cannot assign requested address

have to prune the system, it was failing from /tmp/podman-id in a fresh install, now is working with the change, but why is happening this now?, before there were this kind of error.

vrothberg commented 2 years ago

Please try with newer versions of Podman.

umohnani8 commented 2 years ago

@edsantiago have you seen this recently in CI? We fixed a bunch of things in December which should have made it into the newer podman versions since then.

edsantiago commented 2 years ago

Not found in my logs since 04-04. I can no longer reproduce, because podman-in-podman no longer works for me:

# bin/podman --storage-driver vfs run -d --name foo quay.io/libpod/testimage:20210610 date
Error: failed to initialize journal: write unixgram @c0acf->/run/systemd/journal/socket: sendmsg: no such file or directory

(That's an error I've never seen before, and don't have time to track down today)

rhatdan commented 2 years ago

That looks like a bug in the containers.conf inside of the container. Podman inside of a container should be using file for logging and events. Did you do this with quay.io/podman/stable?

edsantiago commented 2 years ago

With quay.io/podman/stable everything works fine

cevich commented 2 years ago

IIRC, the original context for this was using quay.io/libpod/fedora_podman:<image id> in CI. So podman would be compiled and installed from source, though the other packages would come from RPMs. That container image essentially mirrors the same environment setup in the CI VMs. Ssssooo, if the error hasn't been observed in a few months, we might could probably just close this as fixed.