Closed tianouya-db closed 11 months ago
It's indeed a defect. It happens when loading overlaybd lsmt metas encounter network failure. Overlaybd failed to test tar/zfile header, and the opened file is treate as raw lsmt file (a 4K alignment format), however the file size is smaller than 4K. lsmt file read has a logic to make sure it reads 4K data, so it fall into an infinite loop.
@yuchen0cc I assume the 0.6.13 version contains the fixes?
@yuchen0cc I assume the 0.6.13 version contains the fixes?
Yes, it does.
What happened in your environment?
We encountered an error a few times where containerd fails to start a container with overlaybd. The error we see in containerd is like:
From overlaybd.log, we see it received 503 when downloading a blob, but then ran into an unrecoverable state - it's stuck in an infinite loop in
pread
insure_file.cpp
. These are the logs:Environment: Azure.
What did you expect to happen?
Restarting overlaybd-tcmu and overlaybd-snapshotter does fix the issue. However, we expect overlaybd to handle the errors properly and not run into an infinite loop..
How can we reproduce it?
We don't have a consistent repro. It seems the issue is triggered in a rare race condition. However, because of our large load, this happens quite frequently.
What is the version of your Overlaybd?
v0.6.12
What is your OS environment?
Ubuntu 20.04
Are you willing to submit PRs to fix it?