Spike of disk I/O with podman run

miquecg commented 4 years ago

/kind bug

Description Briefly after starting any new container with podman run ... I notice a big spike on hard drive activity that doesn't stop after minutes.

The moment I exit the container, disk usage goes back to normal.

Steps to reproduce the issue:

podman run -it alpine /bin/sh
wait a few seconds or execute a non harmful ls -la /

Describe the results you received: A lot of abnormal disk usage (LED is constantly blinking).

Additional information you deem important (e.g. issue happens only occasionally): However, if I execute a shell on a detached container, nothing strange happens. This works fine:

➜  $ ~ podman run --detach alpine sh -c 'while true ; do sleep 100000 ; done'
➜  $ ~ podman exec -it whatever_container /bin/sh

I've recently changed storage driver to overlay and installed fuse-overlayfs on my system. Moving back to vfs makes no difference.

Output of podman version:

Version:            1.8.0
RemoteAPI Version:  1
Go Version:         go1.14
Git Commit:         2ced9094d4728dd09f60a177faa32339a8d0f721
Built:              Sat Feb 29 14:20:42 2020
OS/Arch:            linux/amd64

Output of podman info --debug:

debug:
  compiler: gc
  git commit: 2ced9094d4728dd09f60a177faa32339a8d0f721
  go version: go1.14
  podman version: 1.8.0
host:
  BuildahVersion: 1.13.1
  CgroupVersion: v1
  Conmon:
    package: Unknown
    path: /usr/bin/conmon
    version: 'conmon version 2.0.11, commit: ff9d97a08d7a4b58267ac03719786e4e7258cecf'
  Distribution:
    distribution: arch
    version: unknown
  IDMappings:
    gidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
    uidmap:
    - container_id: 0
      host_id: 1000
      size: 1
    - container_id: 1
      host_id: 100000
      size: 65536
  MemFree: 3655389184
  MemTotal: 8312897536
  OCIRuntime:
    name: runc
    package: Unknown
    path: /usr/bin/runc
    version: |-
      runc version 1.0.0-rc10
      commit: dc9208a3303feef5b3839f4323d9beb36df0a9dd
      spec: 1.0.1-dev
  SwapFree: 4294426624
  SwapTotal: 4294963200
  arch: amd64
  cpus: 4
  eventlogger: journald
  hostname: myhost
  kernel: 5.5.8-arch1-1
  os: linux
  rootless: true
  slirp4netns:
    Executable: /usr/bin/slirp4netns
    Package: Unknown
    Version: |-
      slirp4netns version 0.4.3
      commit: 2244b9b6461afeccad1678fac3d6e478c28b4ad6
  uptime: 7h 42m 0.54s (Approximately 0.29 days)
registries:
  search:
  - docker.io
  - registry.fedoraproject.org
  - quay.io
  - registry.access.redhat.com
  - registry.centos.org
store:
  ConfigFile: /home/myuser/.config/containers/storage.conf
  ContainerStore:
    number: 0
  GraphDriverName: overlay
  GraphOptions:
    overlay.mount_program:
      Executable: /usr/bin/fuse-overlayfs
      Package: Unknown
      Version: |-
        fusermount3 version: 3.9.0
        fuse-overlayfs: version 0.7.7
        FUSE library version 3.9.0
        using FUSE kernel interface version 7.31
  GraphRoot: /home/myuser/.local/share/containers/storage
  GraphStatus:
    Backing Filesystem: extfs
    Native Overlay Diff: "false"
    Supports d_type: "true"
    Using metacopy: "false"
  ImageStore:
    number: 1
  RunRoot: /run/user/1000/containers
  VolumePath: /home/myuser/.local/share/containers/storage/volumes

Configuration file (~/.config/containers/storage.conf): I've tried two different configurations but neither works.

Config 1

[storage]
  driver = "overlay"
  runroot = "/run/user/1000/containers"
  graphroot = "/home/myuser/.local/share/containers/storage"
  [storage.options]
    additionalimagestores = ["/home/myuser/.local/share/containers2/storage"]
    size = ""
    remap-uids = ""
    remap-gids = ""
    ignore_chown_errors = ""
    remap-user = ""
    remap-group = ""
    skip_mount_home = ""
    mount_program = "/usr/bin/fuse-overlayfs"
    mountopt = ""
    [storage.options.aufs]
      mountopt = ""
    [storage.options.btrfs]
      min_space = ""
      size = ""
    [storage.options.thinpool]
      autoextend_percent = ""
      autoextend_threshold = ""
      basesize = ""
      blocksize = ""
      directlvm_device = ""
      directlvm_device_force = ""
      fs = ""
      log_level = ""
      min_free_space = ""
      mkfsarg = ""
      mountopt = ""
      size = ""
      use_deferred_deletion = ""
      use_deferred_removal = ""
      xfs_nospace_max_retries = ""
    [storage.options.overlay]
      ignore_chown_errors = ""
      mountopt = ""
      mount_program = ""
      size = ""
      skip_mount_home = ""
    [storage.options.vfs]
      ignore_chown_errors = ""
    [storage.options.zfs]
      mountopt = ""
      fsname = ""
      size = ""

Config 2

[storage]
  driver = "vfs"
  runroot = "/run/user/1000/containers"
  graphroot = "/home/mysuer/.local/share/containers/storage"
  [storage.options]
    size = ""
    remap-uids = ""
    remap-gids = ""
    ignore_chown_errors = ""
    remap-user = ""
    remap-group = ""
    skip_mount_home = ""
    mount_program = ""
    mountopt = ""
    [storage.options.aufs]
      mountopt = ""
    [storage.options.btrfs]
      min_space = ""
      size = ""
    [storage.options.thinpool]
      autoextend_percent = ""
      autoextend_threshold = ""
      basesize = ""
      blocksize = ""
      directlvm_device = ""
      directlvm_device_force = ""
      fs = ""
      log_level = ""
      min_free_space = ""
      mkfsarg = ""
      mountopt = ""
      size = ""
      use_deferred_deletion = ""
      use_deferred_removal = ""
      xfs_nospace_max_retries = ""
    [storage.options.overlay]
      ignore_chown_errors = ""
      mountopt = ""
      mount_program = ""
      size = ""
      skip_mount_home = ""
    [storage.options.vfs]
      ignore_chown_errors = ""
    [storage.options.zfs]
      mountopt = ""
      fsname = ""
      size = ""

Package info (e.g. output of rpm -q podman or apt list podman):

➜  $ ~ pacman -Qi podman  
Nombre                    : podman
Versión                   : 1.8.0-2
Descripción               : Tool and library for running OCI-based containers in pods
Arquitectura              : x86_64
URL                       : https://github.com/containers/libpod
Licencias                 : Apache
Grupos                    : Nada
Provee                    : Nada
Depende de                : cni-plugins  conmon  device-mapper  iptables  libseccomp  runc  skopeo
                            btrfs-progs  slirp4netns  libsystemd
Dependencias opcionales   : podman-docker: for Docker-compatible CLI
Exigido por               : Nada
Opcional para             : Nada
En conflicto con          : Nada
Remplaza a                : Nada
Tamaño de la instalación  : 99,67 MiB
Encargado                 : Morten Linderud <foxboron@archlinux.org>
Fecha de creación         : sáb 29 feb 2020 14:20:42
Fecha de instalación      : mar 03 mar 2020 21:05:01
Motivo de la instalación  : Instalado explícitamente
Guion de instalación      : No
Validado por              : Firma

Additional environment details (AWS, VirtualBox, physical, etc.): Physical host.

miquecg commented 4 years ago

Today I've received a Podman update on Arch Linux:

Version:            1.8.1
RemoteAPI Version:  1
Go Version:         go1.14
Git Commit:         444a19cdd2e6108c75f6c1aadc1a2a9138a8bd73
Built:              Wed Mar 11 22:49:18 2020
OS/Arch:            linux/amd64

But issue is still present.

mheon commented 4 years ago

Do you have any idea what's being written? Can you use iotop to see what's generating the I/O?

miquecg commented 4 years ago

Well, iotop doesn't show any extra information besides command and I/O metrics.

Command: podman run -it /bin/sh IO: between 60-70 %

This time I could have a normal run without issues but on the next ones it happened again.

mheon commented 4 years ago

So it is Podman itself, and not Conmon.

@baude Any thoughts here?

mheon commented 4 years ago

Oh - @miquecg Can you run a podman run command as root, and see if that triggers it? Would be useful to know if this is rootless-only

miquecg commented 4 years ago

It happens also when running podman as root.

baude commented 4 years ago

filesystems arent my bag ... @giuseppe any ideas?

giuseppe commented 4 years ago

I am not able to reproduce locally.

@miquecg could you strace the podman process?

Once you are able to reproduce the issue, please attach strace to it with: strace -f -p $PID_PODMAN

miquecg commented 4 years ago

Command: podman run -it /bin/sh IO: between 60-70 %

I measured I/O again only showing processes this time, not threads. Numbers are one order of magnitude smaller (3-6 %).

I left strace running for less than 10 seconds and the dump is huge, 15 MB! I'm attaching a trimmed down version of it:

strace.log

Update: Spotify on my machine does a lot of I/O on cold start but it doesn't even reach 3 %

giuseppe commented 4 years ago

thanks, I don't see anything strange in the strace log.

Have you attached the first part of it?

Could you attach the part that is generated during the I/O spike?

miquecg commented 4 years ago

Once this strange behaviour begins, I/O activity is constantly at 3-6 %.

Full strace dump gzipped

miquecg commented 4 years ago

@giuseppe Have you had time to check the full log?

These days I don't have access to the machine where this is happening so I cannot check again with newer Podman versions. Sorry.

tylarb commented 4 years ago

I'm running podman 1.8.2 on Manjaro, but don't see any meaningful IO. iotop doesn't show much for podman. @miquecg can you provide a copy of your iotop output for clarity?

I took a look at the strace. It would be great if you could get a new strace with more details - use the following flags: strace -fvttTyyx -s 4096 -o /tmp/strace.txt command or with PID: strace -fvttTyyx -s 4096 -o /tmp/strace.txt -p PID

The first thing to observe is perhaps what files we're opening - can't produce IO without opening files, then read/write to them.

~2000 instances of open in 200,000 syscalls:

libpod$ grep -e 'open(' -e 'openat(' -c strace.dump 
2048
libpod$ wc -l strace.dump 
200816 strace.dump

Again, we don't have timing flags which could help us analyze how long any syscall might be taking, but we can at least see what files are being opened - and if any of them might be on a physical disk versus nodev device like /proc

libpod$ grep -e 'open(' -e 'openat(' strace.dump | cut -d " " -f 3- | awk '{print $2}'  | sort | uniq -c
    128 "/etc/ld.so.cache",
    256 "/home/myuser/.local/share/containers/storage/libpod/bolt_state.db",
    256 "/proc/52785/stat",
    128 "/proc/cpuinfo",
    128 "/proc/self/mountinfo",
    256 "/proc/self/uid_map",
    128 "/proc/sys/kernel/cap_last_cap",
    128 "/run/user/1000/runc/66bf94e5ff853ca511e9e1fd1759e5a453e46501cbedb867eb6c5f795dbb149c/state.json",
    128 "/sys/kernel/mm/hugepages",
    128 "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size",
    128 "/usr/lib/libc.so.6",
    128 "/usr/lib/libpthread.so.0",
    128 "/usr/lib/libseccomp.so.2",

So, on physical disks, three files in /usr/lib, /etc/ld.so.cache, and /home/myuser/.local/share/containers/storage/libpod/bolt_state.db... Again, we don't know if the open or read of these files is taking a long time.

From man iotop:

iotop displays columns for the I/O bandwidth read and written by each process/thread during the sampling  period.  It also displays the percentage of time the thread/process spent while swapping in and while waiting on I/O. For each process, its I/O priority (class/level) is shown.

So, I believe the "%" collumn of iotop is similar to sar's IOwait, which is time the process spent waiting on IO that it wasn't doing something else. In other words, having high IO might partially mean that the process isn't doing much compute - so instead of idle, it's "blocked" on io....

For reference - was the machine you were running this on using some slow disk technology? I've got an nmve card on my machine, so IO calls are generally very quick.

miquecg commented 4 years ago

@tylarb Thanks for taking a look on this. I'll let you know whenever I have access to the machine where I had the issue.

Hard drive is SATA with LVM and some partitions but I cannot provide more info without looking at it.

Probably is what you suggest, some system configuration or hardware condition is causing the issue.

reagentoo commented 4 years ago

I've got the same problem on Gentoo, SATA HDD drive with LVM+LUKS+Ext4 (libpod-1.8.0) Can provide any info.

reagentoo commented 4 years ago

@tylarb Sorry if I misunderstood your tips. I used strace ... -p PID where is podman's PID on the host system:

$ grep -e 'open(' -e 'openat(' -c strace1.txt
13393
$ wc -l strace1.txt
1013744 strace1.txt
$ grep -e 'open(' -e 'openat(' strace1.txt | cut -d " " -f 3- | awk '{print $2}' | grep AT_FDCWD | wc -l
4378

4378 strings is openat(AT_FDCWD, Another fragments from last grep:

openat(AT_FDCWD,
openat(AT_FDCWD,
openat(AT_FDCWD,
openat(AT_FDCWD,
"/etc/ld.so.preload",
"/etc/ld.so.cache",
"/lib64/libpthread.so.0",
"/usr/lib64/libseccomp.so.2",
"/lib64/libc.so.6",
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size",
"/proc/sys/kernel/cap_last_cap",
"/proc/cpuinfo",
"/proc/self/mountinfo",
"/sys/kernel/mm/hugepages",
"/proc/self/uid_map",
"/proc/self/uid_map",
"/run/user/1000/runc/a635f25315ef3ddfcbd5791e92e6abb0ef8f591c80186d265c48b967b92e7872/state.json",
"/proc/6328/stat",
"/proc/6328/stat",

tylarb commented 4 years ago

@reagentoo let's get strace with timings strace -fvttTyyx -s 4096 -o /tmp/strace.txt -p PID

This way, we can see if the open, read, or write is taking a long time, indicating the syscall is slow, which would mean the problem is underneath Podman - at OS level.

If you provide the full strace file I can take a look and we can observe if we're reading more bytes than expected or is reasonable for some reason.

rhatdan commented 4 years ago

@giuseppe FYI

miquecg commented 4 years ago

I've got the same problem on Gentoo, SATA HDD drive with LVM+LUKS+Ext4 (libpod-1.8.0) Can provide any info.

Exactly same setup.

IIRC, I don't have this problem when running podman root. The main difference could be the encryption of /home partition, but I'd need to double check it.

Sorry for not being really helpful here but due to quarantine we must stay home everybody and I cannot access that computer.

reagentoo commented 4 years ago

@miquecg I didn't try to run from root. But now I've updated libpod from 1.8.0 to 1.8.2. And now I don't observe this issue yet.

miquecg commented 4 years ago

Since you can confirm the problem doesn't happen anymore, let's close the issue report. At least you test it. Thanks,

containers / podman

Spike of disk I/O with podman run #5464