containerd / overlaybd

Overlaybd: a block based remote image format. The storage backend of containerd/accelerated-container-image.
Apache License 2.0
260 stars 58 forks source link

how to monitor overlaybd in production env #101

Closed wxx213 closed 2 years ago

wxx213 commented 2 years ago

We need to monitor if the overlaybd is running well, is there any suggestion for this?

BigVan commented 2 years ago

Good question...

I think u need to focus on the following point: - overlaybd service healthy:

 systemctl is-active "overlaybd-tcmu"

- I/O hang detect:

iostat -x 1 /dev/sdX
## watch the 'ioutil' of overlaybd if 100% for a long time without any r/w request

- I/O latency we save all I/O requests of overlaybd in '/var/log/overlaybd-audit.log', like this:

2021/12/01 17:09:38|AUDIT|th=00007F91D406D880|file:pread[pathname=/var/lib/containerd/io.containerd.snapshotter.v1.overlaybd/snapshots/1/fs/overlaybd.commit][offset=528384][size=4096][latency=18253]
2021/12/01 17:20:35|AUDIT|th=00007F4A57FFF040|file:read[pathname=][offset=12478092][size=65536][latency=11742]
2021/12/01 17:20:35|AUDIT|th=00007F4A56FF3040|file:read[pathname=][offset=6818592][size=547][latency=11602]
2021/11/03 20:11:35|AUDIT|th=00007F312965EC80|download[pathname=https://registry.hub.docker.com/v2/overlaybd/redis/blobs/sha256:f2d33f598db59a8a4fcb490764cdfca3157ec6a742870378154cbef93acefce9][offset=17300874][size=262026][latency=29957]
2022/06/06 11:13:34|AUDIT|th=00007FD153FD3440|file:write[pathname=][offset=9175040][size=524288][latency=10034]
...

[download]: on-demand read from remote storage(registry)
[pread]: on-demand read from cache (/opt/overlaybd/registry_cache)
[file:read]: read from localfile
[file:write]:write into localfile
wxx213 commented 2 years ago

Thanks, what's more, could the kernel module(like the file in /proc or /sys) help for this?

BigVan commented 2 years ago

I have no idea about it. In our production environment, we usually focus on I/O status(latency or hang?) and RSS memory.

wxx213 commented 2 years ago

Okay, it helps a lot, thanks.

BigVan commented 2 years ago

For sync communication catch us in the #overlaybd slack channels on Cloud Native Computing Foundation's (CNCF) slack - cloud-native.slack.com. Everyone is welcome to join and chat. Get Invite to CNCF slack.

Welcome~