Closed miquecg closed 4 years ago
Today I've received a Podman update on Arch Linux:
Version: 1.8.1
RemoteAPI Version: 1
Go Version: go1.14
Git Commit: 444a19cdd2e6108c75f6c1aadc1a2a9138a8bd73
Built: Wed Mar 11 22:49:18 2020
OS/Arch: linux/amd64
But issue is still present.
Do you have any idea what's being written? Can you use iotop
to see what's generating the I/O?
Well, iotop
doesn't show any extra information besides command and I/O metrics.
Command: podman run -it /bin/sh
IO: between 60-70 %
This time I could have a normal run without issues but on the next ones it happened again.
So it is Podman itself, and not Conmon.
@baude Any thoughts here?
Oh - @miquecg Can you run a podman run
command as root, and see if that triggers it? Would be useful to know if this is rootless-only
It happens also when running podman as root.
filesystems arent my bag ... @giuseppe any ideas?
I am not able to reproduce locally.
@miquecg could you strace the podman process?
Once you are able to reproduce the issue, please attach strace to it with: strace -f -p $PID_PODMAN
Command:
podman run -it /bin/sh
IO: between 60-70 %
I measured I/O again only showing processes this time, not threads. Numbers are one order of magnitude smaller (3-6 %).
I left strace running for less than 10 seconds and the dump is huge, 15 MB! I'm attaching a trimmed down version of it:
Update: Spotify on my machine does a lot of I/O on cold start but it doesn't even reach 3 %
thanks, I don't see anything strange in the strace log.
Have you attached the first part of it?
Could you attach the part that is generated during the I/O spike?
Once this strange behaviour begins, I/O activity is constantly at 3-6 %.
@giuseppe Have you had time to check the full log?
These days I don't have access to the machine where this is happening so I cannot check again with newer Podman versions. Sorry.
I'm running podman 1.8.2 on Manjaro, but don't see any meaningful IO. iotop doesn't show much for podman. @miquecg can you provide a copy of your iotop output for clarity?
I took a look at the strace. It would be great if you could get a new strace with more details - use the following flags: strace -fvttTyyx -s 4096 -o /tmp/strace.txt command
or with PID: strace -fvttTyyx -s 4096 -o /tmp/strace.txt -p PID
The first thing to observe is perhaps what files we're opening - can't produce IO without opening files, then read/write to them.
~2000 instances of open in 200,000 syscalls:
libpod$ grep -e 'open(' -e 'openat(' -c strace.dump
2048
libpod$ wc -l strace.dump
200816 strace.dump
Again, we don't have timing flags which could help us analyze how long any syscall might be taking, but we can at least see what files are being opened - and if any of them might be on a physical disk versus nodev
device like /proc
libpod$ grep -e 'open(' -e 'openat(' strace.dump | cut -d " " -f 3- | awk '{print $2}' | sort | uniq -c
128 "/etc/ld.so.cache",
256 "/home/myuser/.local/share/containers/storage/libpod/bolt_state.db",
256 "/proc/52785/stat",
128 "/proc/cpuinfo",
128 "/proc/self/mountinfo",
256 "/proc/self/uid_map",
128 "/proc/sys/kernel/cap_last_cap",
128 "/run/user/1000/runc/66bf94e5ff853ca511e9e1fd1759e5a453e46501cbedb867eb6c5f795dbb149c/state.json",
128 "/sys/kernel/mm/hugepages",
128 "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size",
128 "/usr/lib/libc.so.6",
128 "/usr/lib/libpthread.so.0",
128 "/usr/lib/libseccomp.so.2",
So, on physical disks, three files in /usr/lib
, /etc/ld.so.cache
, and /home/myuser/.local/share/containers/storage/libpod/bolt_state.db
... Again, we don't know if the open or read of these files is taking a long time.
From man iotop:
iotop displays columns for the I/O bandwidth read and written by each process/thread during the sampling period. It also displays the percentage of time the thread/process spent while swapping in and while waiting on I/O. For each process, its I/O priority (class/level) is shown.
So, I believe the "%" collumn of iotop is similar to sar's IOwait, which is time the process spent waiting on IO that it wasn't doing something else. In other words, having high IO might partially mean that the process isn't doing much compute - so instead of idle, it's "blocked" on io....
For reference - was the machine you were running this on using some slow disk technology? I've got an nmve card on my machine, so IO calls are generally very quick.
@tylarb Thanks for taking a look on this. I'll let you know whenever I have access to the machine where I had the issue.
Hard drive is SATA with LVM and some partitions but I cannot provide more info without looking at it.
Probably is what you suggest, some system configuration or hardware condition is causing the issue.
I've got the same problem on Gentoo, SATA HDD drive with LVM+LUKS+Ext4 (libpod-1.8.0) Can provide any info.
@tylarb
Sorry if I misunderstood your tips.
I used strace ... -p PID
where is podman's PID on the host system:
$ grep -e 'open(' -e 'openat(' -c strace1.txt
13393
$ wc -l strace1.txt
1013744 strace1.txt
$ grep -e 'open(' -e 'openat(' strace1.txt | cut -d " " -f 3- | awk '{print $2}' | grep AT_FDCWD | wc -l
4378
4378 strings is openat(AT_FDCWD,
Another fragments from last grep:
openat(AT_FDCWD,
openat(AT_FDCWD,
openat(AT_FDCWD,
openat(AT_FDCWD,
"/etc/ld.so.preload",
"/etc/ld.so.cache",
"/lib64/libpthread.so.0",
"/usr/lib64/libseccomp.so.2",
"/lib64/libc.so.6",
"/sys/kernel/mm/transparent_hugepage/hpage_pmd_size",
"/proc/sys/kernel/cap_last_cap",
"/proc/cpuinfo",
"/proc/self/mountinfo",
"/sys/kernel/mm/hugepages",
"/proc/self/uid_map",
"/proc/self/uid_map",
"/run/user/1000/runc/a635f25315ef3ddfcbd5791e92e6abb0ef8f591c80186d265c48b967b92e7872/state.json",
"/proc/6328/stat",
"/proc/6328/stat",
@reagentoo let's get strace with timings strace -fvttTyyx -s 4096 -o /tmp/strace.txt -p PID
This way, we can see if the open, read, or write is taking a long time, indicating the syscall is slow, which would mean the problem is underneath Podman - at OS level.
If you provide the full strace file I can take a look and we can observe if we're reading more bytes than expected or is reasonable for some reason.
@giuseppe FYI
I've got the same problem on Gentoo, SATA HDD drive with LVM+LUKS+Ext4 (libpod-1.8.0) Can provide any info.
Exactly same setup.
IIRC, I don't have this problem when running podman root. The main difference could be the encryption of /home
partition, but I'd need to double check it.
Sorry for not being really helpful here but due to quarantine we must stay home everybody and I cannot access that computer.
@miquecg I didn't try to run from root. But now I've updated libpod from 1.8.0 to 1.8.2. And now I don't observe this issue yet.
Since you can confirm the problem doesn't happen anymore, let's close the issue report. At least you test it. Thanks,
/kind bug
Description Briefly after starting any new container with
podman run ...
I notice a big spike on hard drive activity that doesn't stop after minutes.The moment I exit the container, disk usage goes back to normal.
Steps to reproduce the issue:
podman run -it alpine /bin/sh
ls -la /
Describe the results you received: A lot of abnormal disk usage (LED is constantly blinking).
Additional information you deem important (e.g. issue happens only occasionally): However, if I execute a shell on a detached container, nothing strange happens. This works fine:
I've recently changed storage driver to
overlay
and installedfuse-overlayfs
on my system. Moving back tovfs
makes no difference.Output of
podman version
:Output of
podman info --debug
:Configuration file (~/.config/containers/storage.conf): I've tried two different configurations but neither works.
Config 1
Config 2
Package info (e.g. output of
rpm -q podman
orapt list podman
):Additional environment details (AWS, VirtualBox, physical, etc.): Physical host.