Closed wxx213 closed 2 years ago
Good question...
I think u need to focus on the following point: - overlaybd service healthy:
systemctl is-active "overlaybd-tcmu"
- I/O hang detect:
iostat -x 1 /dev/sdX
## watch the 'ioutil' of overlaybd if 100% for a long time without any r/w request
- I/O latency we save all I/O requests of overlaybd in '/var/log/overlaybd-audit.log', like this:
2021/12/01 17:09:38|AUDIT|th=00007F91D406D880|file:pread[pathname=/var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/1/fs/overlaybd.commit][offset=528384][size=4096][latency=18253]
2021/12/01 17:20:35|AUDIT|th=00007F4A57FFF040|file:read[pathname=][offset=12478092][size=65536][latency=11742]
2021/12/01 17:20:35|AUDIT|th=00007F4A56FF3040|file:read[pathname=][offset=6818592][size=547][latency=11602]
2021/11/03 20:11:35|AUDIT|th=00007F312965EC80|download[pathname=https://registry.hub.docker.com/v2/overlaybd/redis/blobs/sha256:f2d33f598db59a8a4fcb490764cdfca3157ec6a742870378154cbef93acefce9][offset=17300874][size=262026][latency=29957]
2022/06/06 11:13:34|AUDIT|th=00007FD153FD3440|file:write[pathname=][offset=9175040][size=524288][latency=10034]
...
[download]: on-demand read from remote storage(registry)
[pread]: on-demand read from cache (/opt/overlaybd/registry_cache)
[file:read]: read from localfile
[file:write]:write into localfile
Thanks, what's more, could the kernel module(like the file in /proc or /sys) help for this?
I have no idea about it. In our production environment, we usually focus on I/O status(latency or hang?) and RSS memory.
Okay, it helps a lot, thanks.
For sync communication catch us in the #overlaybd slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF slack.
Welcome~
We need to monitor if the overlaybd is running well, is there any suggestion for this?