containerd / accelerated-container-image

A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
Apache License 2.0
409 stars 75 forks source link

Lots of fiemap Operation not supported errors in overlaybd.log #217

Open simha-db opened 1 year ago

simha-db commented 1 year ago

What happened in your environment?

Seeing a lot of errors like this

2023/07/27 01:26:09|ERROR|th=00007FDBBE6B1680|/src/src/overlaybd/cache/full_file_cache/cache_store.cpp:100|queryRefillRange:media fiemap failed : -1, offset : 0, size : 4096 errno=95(Operation not supported)

Any idea what this is about? We are side loading the registry cache files for overlaybd - does that have anything to do with these errors?

What did you expect to happen?

No response

How can we reproduce it?

NA

What is the version of your Accelerated Container Image?

0.6.12

What is your OS environment?

Ubuntu 20.04

Are you willing to submit PRs to fix it?

liulanzheng commented 1 year ago

@simha-db What filesystem are you using for cache?

simha-db commented 1 year ago

do u mean for the /opt/overlaybd/registry_cache?

So the disk IO was too slow to download the container images quickly - so i tried the following

  1. Download layers into /dev/shm which is tmpfs
  2. create a symlink from /opt/overlaybd/registry_cache into the /dev/shm/file
  3. Start container
  4. Copy the tmpfs file to /opt/overlaybd/registry_cache in the background and once finishes - remove the symlinks

is fiemap supposed to work with tmpfs?

liulanzheng commented 1 year ago

@simha-db fiemap is not supported for tmpfs

simha-db commented 1 year ago

Ah dang it - any other way to buffer the writes? Page cache is u predictable in how it penalizes heavy writes.

lihuiba commented 1 year ago

Downloading layer blobs to tmpfs with background copying should be similar to using /opt/overlaybd/registry_cache directly. They both download the blobs to memory, and write to disk in background. I don't see much difference.

simha-db commented 1 year ago

Not necessarily- once the page cache dirty thresholds are hit - it starts flushing and the writes slow down significantly. Using tmpfs allowed us to let overlaybd use the files while we copy - barring a glitch when we delete the symlink which we figured will be retried - but the sparse file support makes it a no go looks like.

simha-db commented 1 year ago

Btw what happens when the tmpfs errors out? We did not see errors in the container startup. Does it fallback to something else?

lihuiba commented 1 year ago

writes slow down significantly

It slows down because there's not much free memory, so it must wait for flushing. The same happens for tmpfs, which is also backed by page cache.

We did not see errors in the container startup

I believe its a bug. It doesn't deal with the errors returned by fiemap(), and take it as cache-hit.

simha-db commented 1 year ago

Oh what does it return to the container?

lihuiba commented 1 year ago

It reads data from the blobs in the tmpfs, and returns the data to the container. It happens to go as what you wanted :-)

simha-db commented 1 year ago

Not the happy case. I am referring to

I believe its a bug. It doesn't deal with the errors returned by fiemap(), and take it as cache-hit.

Since i did not see any errors - what happens when the fiemap failed?

simha-db commented 1 year ago

@lihuiba @liulanzheng Does overlaybd read registry cache files with O_DIRECT? I saw it does when using libaio - we use psync -

liulanzheng commented 1 year ago

@lihuiba @liulanzheng Does overlaybd read registry cache files with O_DIRECT? I saw it does when using libaio - we use psync -

o_direct is not used

lihuiba commented 1 year ago

what happens when the fiemap failed?

The cache take it as cache-hit, and read from the file. It happens to go as what you wanted.

it does when using libaio

If the libaio ioengine is used, the O_DIRECT flag will be automatically included.