containers / common

Location for shared common files in github.com/containers repos.
Apache License 2.0
192 stars 200 forks source link

Running without a uid 0 : write to `/proc/sys/net/ipv4/ping_group_range`: Invalid argument #1802

Open grooverdan opened 10 months ago

grooverdan commented 10 months ago
$ podman run --uidmap=999:0  --user 999  --rm   mariadb:10.11
Error: OCI runtime error: crun: write to `/proc/sys/net/ipv4/ping_group_range`: Invalid argument

From strace:

[pid 24012] openat(AT_FDCWD, "/proc/sys", O_RDONLY|O_DIRECTORY) = 15
[pid 24012] openat(15, "net/ipv4/ping_group_range", O_WRONLY) = 16
[pid 24012] write(16, "0 0", 3)         = -1 EINVAL (Invalid argument)

Can we just ignore this?

$ podman info
host:
  arch: amd64
  buildahVersion: 1.33.2
  cgroupControllers:
  - cpu
  - io
  - memory
  - pids
  cgroupManager: systemd
  cgroupVersion: v2
  conmon:
    package: conmon-2.1.8-2.fc39.x86_64
    path: /usr/bin/conmon
    version: 'conmon version 2.1.8, commit: '
  cpuUtilization:
    idlePercent: 98.47
    systemPercent: 0.35
    userPercent: 1.19
  cpus: 16
  databaseBackend: boltdb
  distribution:
    distribution: fedora
    variant: workstation
    version: "39"
  eventLogger: journald
  freeLocks: 1976
  hostname: bark
  idMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  kernel: 6.6.9-200.fc39.x86_64
  linkmode: dynamic
  logDriver: journald
  memFree: 16979173376
  memTotal: 33418534912
  networkBackend: netavark
  networkBackendInfo:
    backend: netavark
    dns:
      package: aardvark-dns-1.9.0-1.fc39.x86_64
      path: /usr/libexec/podman/aardvark-dns
      version: aardvark-dns 1.9.0
    package: netavark-1.9.0-1.fc39.x86_64
    path: /usr/libexec/podman/netavark
    version: netavark 1.9.0
  ociRuntime:
    name: crun
    package: crun-1.12-1.fc39.x86_64
    path: /usr/bin/crun
    version: |-
      crun version 1.12
      commit: ce429cb2e277d001c2179df1ac66a470f00802ae
      rundir: /run/user/1000/crun
      spec: 1.0.0
      +SYSTEMD +SELINUX +APPARMOR +CAP +SECCOMP +EBPF +CRIU +LIBKRUN +WASM:wasmedge +YAJL
  os: linux
  pasta:
    executable: /usr/bin/pasta
    package: passt-0^20231204.gb86afe3-1.fc39.x86_64
    version: |
      pasta 0^20231204.gb86afe3-1.fc39.x86_64
      Copyright Red Hat
      GNU General Public License, version 2 or later
        <https://www.gnu.org/licenses/old-licenses/gpl-2.0.html>
      This is free software: you are free to change and redistribute it.
      There is NO WARRANTY, to the extent permitted by law.
  remoteSocket:
    exists: false
    path: /run/user/1000/podman/podman.sock
  security:
    apparmorEnabled: false
    capabilities: CAP_CHOWN,CAP_DAC_OVERRIDE,CAP_FOWNER,CAP_FSETID,CAP_KILL,CAP_NET_BIND_SERVICE,CAP_SETFCAP,CAP_SETGID,CAP_SETPCAP,CAP_SETUID,CAP_SYS_CHROOT
    rootless: true
    seccompEnabled: true
    seccompProfilePath: /usr/share/containers/seccomp.json
    selinuxEnabled: true
  serviceIsRemote: false
  slirp4netns:
    executable: /usr/bin/slirp4netns
    package: slirp4netns-1.2.2-1.fc39.x86_64
    version: |-
      slirp4netns version 1.2.2
      commit: 0ee2d87523e906518d34a6b423271e4826f71faf
      libslirp: 4.7.0
      SLIRP_CONFIG_VERSION_MAX: 4
      libseccomp: 2.5.3
  swapFree: 16898842624
  swapTotal: 16898842624
  uptime: 4h 44m 44.00s (Approximately 0.17 days)
  variant: ""
plugins:
  authorization: null
  log:
  - k8s-file
  - none
  - passthrough
  - journald
  network:
  - bridge
  - macvlan
  - ipvlan
  volume:
  - local
registries:
  search:
  - registry.fedoraproject.org
  - registry.access.redhat.com
  - docker.io
  - quay.io
store:
  configFile: /home/dan/.config/containers/storage.conf
  containerStore:
    number: 35
    paused: 0
    running: 0
    stopped: 35
  graphDriverName: overlay
  graphOptions: {}
  graphRoot: /home/dan/.local/share/containers/storage
  graphRootAllocated: 947524849664
  graphRootUsed: 573377179648
  graphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "true"
    Supports d_type: "true"
    Supports shifting: "false"
    Supports volatile: "true"
    Using metacopy: "false"
  imageCopyTmpDir: /var/tmp
  imageStore:
    number: 154
  runRoot: /run/user/1000/containers
  transientStore: false
  volumePath: /home/dan/.local/share/containers/storage/volumes
version:
  APIVersion: 4.8.3
  Built: 1704291100
  BuiltTime: Thu Jan  4 01:11:40 2024
  GitCommit: ""
  GoVersion: go1.21.5
  Os: linux
  OsArch: linux/amd64
  Version: 4.8.3
$  uname -a
Linux bark 6.6.9-200.fc39.x86_64 #1 SMP PREEMPT_DYNAMIC Mon Jan  1 20:05:54 UTC 2024 x86_64 GNU/Linux
rhatdan commented 10 months ago

YOu can modify containers.conf to not set it.

grooverdan commented 10 months ago

Ack like #345

rhatdan commented 10 months ago

Yes except set it to default systctls to []

rhatdan commented 10 months ago

@giuseppe PTAL

giuseppe commented 10 months ago

you can also override it with --sysctl, in your case it will be: --sysctl="net.ipv4.ping_group_range=999 999"

giuseppe commented 10 months ago

also keep in mind that running without root is not generally supported, there is some code in crun to allow it, but for example it fails with runc:

$ podman --runtime crun run --sysctl="net.ipv4.ping_group_range=999 999" --uidmap=999:0  --user 999:999  --rm   busybox echo hi
hi
$ podman --runtime runc run --sysctl="net.ipv4.ping_group_range=999 999" --uidmap=999:0  --user 999:999  --rm   busybox echo hi
Error: OCI runtime error: runc: runc create failed: User namespaces enabled, but no user mapping found.
grooverdan commented 9 months ago

Just clarifying "Can we just ignore this?" I meant treat a write on /proc/sys/net/ipv4/ping_group_range returning EINVAL as non-fatal error/warning/notice. Are the sysctls set by this necessary for any part of podman operations?

giuseppe commented 9 months ago

the OCI runtime sets it, so it is out of our control. We'd need to extend the OCI runtime specs to support "optional" sysctls.

It would be easier to teach Podman not to set it if there is no root user mapped, or even better, to support some templating mechanism like --sysctl="net.ipv4.ping_group_range=$FIRST_UID $NUMBER_UIDS". However, this seems a bit overkill for something that only affects ping_group_range when running without root in the user namespace.

rhatdan commented 9 months ago

You can tall Podman to not set sysctls in this situation or modify containers.conf to not set it. This sysctl is just allowing the root procesess within the container to ping without requiring CAP_NET_RAW.