containerd / accelerated-container-image

A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
Apache License 2.0
406 stars 73 forks source link

Convert to accelerated image within a docker container? #136

Open maxwolffe opened 2 years ago

maxwolffe commented 2 years ago

Is it possible to build overlaybd accelerated images from within a container?

I see that buildkit is experimentally supported (https://github.com/data-accelerator/buildkit) and can be run in a container (https://github.com/data-accelerator/buildkit#containerizing-buildkit), but I also see that accelerator layer is not supported. (https://github.com/data-accelerator/buildkit#containerizing-buildkit).

Is there another path for building or converting overlaybd images within a container?

liulanzheng commented 2 years ago

@maxwolffe Building overlaybd image can be run in containers under a specified condition. First, the container must be privileged and mount /dev into container, because date should write to tcmu devices. Second, run the overlaybd service on host, or run in a privileged container. only one instance in one host, multiple overlaybd backstore is not supported. Could you describe your environment and requirements?

shuaichang commented 2 years ago

Another option is that the overlaybd-ctr could be independent from overlaybd damonse (tcmu and snapshotter), also not require containerd config change: https://github.com/containerd/accelerated-container-image/blob/main/docs/IMAGE_CONVERTOR.md

# bin/ctr supports image conversion without requiring overlaybd-tcmu and overlaybd-snapshotter, or starts overlaybd-tcmu ondemand during conversion.
sudo bin/ctr obdconv registry.hub.docker.com/library/redis:6.2.1 registry.hub.docker.com/overlaybd/redis:6.2.1_obd_new

The use case is that we want to build Overlaybd images which introducing minimal changes to our container image release pipeline.

lihuiba commented 2 years ago

@liulanzheng It is possible in principle that a simple pure command line tool without dependency on container engine can do the conversion in a container. And we'd better have such a tool, like @shuaichang has suggested.

liulanzheng commented 2 years ago

put all in one command line tool requires amount of work. Actually, we can make overlaybd running in containers, even if it doesn't seem formal as other tcmu backstores. But it can be work very well in a controlled environment. Based on the containerization overlaybd, we can make a simple golang conversion tool, including pulling, conversoin and pushing, and most codes can reuse containerd.

lihuiba commented 2 years ago

@shuaichang @maxwolffe We have found a solution to over come the problems that prevents an all-in-one tool for image conversion. We'll try it later. And participation is welcome!

shuaichang commented 2 years ago

@lihuiba that’s great to know, it will be very helpful. Any guess when can we try the conversion tool?

liulanzheng commented 2 years ago

@shuaichang we are focusing on this new approach, but we spent some time exploring. At present, I expect that it will be completed by the end of this month or early next month.

maxwolffe commented 2 years ago

This is awesome news @liulanzheng ! Any update we can follow or help we can offer? We'd love to help pilot this.

liulanzheng commented 1 year ago

@maxwolffe A preliminary version will be released in the next one or two days, but it is not perfect for some formats may not support. We will continue to improve and we can improve it together.

maxwolffe commented 1 year ago

Amazing! Looking forward to it!

liulanzheng commented 1 year ago

@maxwolffe sorry there's a bad news, we encountered some problems in functional test. We known how solve these problems but this new implementation involves thousand lines of new codes, it takes extra time.

liulanzheng commented 1 year ago

@maxwolffe please refer to USERSPACE_CONVERTOR

It's not yet complete very well, feedbacks and questions are welcome.

@yuchen0cc @WaberZhuang are the authors, any problems can state here.

maxwolffe commented 1 year ago

Great! Thanks @liulanzheng and team! Excited to try it out.

maxwolffe commented 1 year ago

@liulanzheng - thanks again for your help getting this out.

I've started playing with this but encountered an issue, I'm hoping you can help me to debug (happy to open a separate issue if helpful).

When I attempt the convertor for an image which I can successfully download (from a private repository), I get a "failed to extract" error.

> sudo bin/convertor -r harbor-xxx/main/universe/kata-installer -u username:password -i latest -o latest_obd

INFO[0002] downloaded layer 0
ERRO[0003] run with error: failed to overlaybd apply for layer 0: failed to apply tar to overlaybd: 2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:135|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:152|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:160|read_global_config_and_set:set log_level:1
2022/11/01 06:31:24|INFO |th=000055DA42F37C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:163|read_global_config_and_set:set log_path:/var/log/overlaybd.log
failed to extract
: exit status 255
liulanzheng commented 1 year ago

@yuchen0cc

yuchen0cc commented 1 year ago

@maxwolffe please set overlaybd-apply log level to 'debug' to get more info. https://github.com/data-accelerator/overlaybd-apply/blob/main/src/tools/overlaybd-apply.cpp#L64

set_log_output_level(0);
maxwolffe commented 1 year ago

@yuchen0cc - thanks friend. So I make that change (and included my own little change to confirm that the logging change was included). Here is the new output (looks very similar to the previous output):

    set_log_output_level(0);
    LOG_INFO("Logging changes included");
    photon::init(photon::INIT_EVENT_DEFAULT, photon::INIT_IO_DEFAULT);
INFO[0002] downloaded layer 0
ERRO[0004] run with error: failed to overlaybd apply for layer 0: failed to apply tar to overlaybd: 2022/11/02 03:46:41|INFO |th=0000000000000000|/home/max.wolffe/overlaybd-apply/src/tools/overlaybd-apply.cpp:65|main:Logging changes included
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2022/11/02 03:46:41|DEBUG|th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/photon-src/net/curl.cpp:227|libcurl_init:libcurl version libcurl/7.58.0 OpenSSL/1.1.1 zlib/1.2.11 libidn2/2.0.4 libpsl/0.19.1 (+libidn2/2.0.4) nghttp2/1.30.0 librtmp/2.3
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:186|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:194|read_global_config_and_set:set log_level:1
2022/11/02 03:46:41|INFO |th=00005596283EAC60|/home/max.wolffe/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:197|read_global_config_and_set:set log_path:/var/log/overlaybd.log
failed to extract
: exit status 255
yuchen0cc commented 1 year ago

@maxwolffe sorry, the setting is overwrite by the following config in 'imgservice'. So please set the log level in config file instead. The config is stored in '/etc/overlaybd/overlaybd.json' by default.

{
    "logLevel": 0,
    ......
}

besides the log printed to terminal, more logs are redirected to '/var/log/overlaybd.log' P.S. pull newest overlaybd commits to avoid unused debug info.

maxwolffe commented 1 year ago

@yuchen0cc - thanks for the pointer there. I updated that, encounter the same error, with the same stderr/stdout output. When I look in the overlay.log, I see the following:

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ grep "06:41" /var/log/overlaybd.log | wc -l
31939
max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ tail -n 10 /var/log/overlaybd.log
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[125829120,8]--> Mapping[125829128,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:66571993088,length:4096}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[130023424,8]--> Mapping[130023432,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:66572054528,length:4096}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[130023544,8]--> Mapping[130023552,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:4096,length:32768}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[8,64]--> Mapping[16,0,1]
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:881|pwrite:{offset:1024,length:1024}
2022/11/09 06:41:09|DEBUG|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:906|pwrite:insert segment: Segment[2,2]--> Mapping[10,0,1]
2022/11/09 06:41:09|ERROR|th=000055F5FE696C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/untar/libtar.cpp:275|extract_all:extract failed, filename usr/lib64/python2.7/unittest/, No space left on device

Here's my current storage device usage:

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ df -H
Filesystem       Size  Used Avail Use% Mounted on
udev              33G     0   33G   0% /dev
tmpfs            6.6G  840k  6.6G   1% /run
/dev/nvme0n1p1   521G   65G  456G  13% /
tmpfs             33G     0   33G   0% /dev/shm
tmpfs            5.3M     0  5.3M   0% /run/lock
tmpfs             33G     0   33G   0% /sys/fs/cgroup
/dev/loop0        51M   51M     0 100% /snap/snapd/17336
/dev/loop1        59M   59M     0 100% /snap/core18/2538
/dev/loop2        50M   50M     0 100% /snap/snapd/16292
/dev/loop3        59M   59M     0 100% /snap/core18/2620
/dev/loop4        26M   26M     0 100% /snap/amazon-ssm-agent/6312
/dev/loop5        27M   27M     0 100% /snap/amazon-ssm-agent/5656
/dev/nvme0n1p15  110M  4.6M  105M   5% /boot/efi
tmpfs            6.6G     0  6.6G   0% /run/user/1000

When I turn off the debug logs I get the following output (complete output for this run):

max.wolffe@ip-10-110-26-78:~/accelerated-container-image$ grep "06:44" /var/log/overlaybd.log
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:171|read_global_config_and_set:global config: cache_dir: /opt/overlaybd/registry_cache, cache_size_GB: 4, cache_type: file
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:227|init:create registryfs with cafile:/etc/ssl/certs/ca-certificates.crt
2022/11/09 06:44:48|INFO |th=00007F1557DCFB00|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/zfile/zfile.cpp:668|load_jump_table:trailer_offset: 4737183, idx_offset: 4207947, idx_bytes: 529236, dict_size: 0, use_dict: 0
2022/11/09 06:44:48|INFO |th=00007F1557DCFB00|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/zfile/compressor.cpp:98|init:create batch buffer, size: 1
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:261|open_lowers:LSMT::open_files_ro(files, 1) success
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:286|open_upper:upper layer : tmp_conv/sha256:a02a4930cb5d36f3290eb84f4bfa30668ef2e9fe3a1fb73ec015fc58b9958b17/writable_index , tmp_conv/sha256:a02a4930cb5d36f3290eb84f4bfa30668ef2e9fe3a1fb73ec015fc58b9958b17/writable_data
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:955|create_mappings:segment size: 0
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1113|open_file_rw:create LSMTSparseFile object
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1126|open_file_rw:Layer Info: { UUID:463ED3A8-B15F-456C-A313-C74008B46039 , Parent_UUID: 00000000-0000-0000-0000-000000000000, SparseRW: 1, Virtual size: 68719476736, Version: 1.1 }
2022/11/09 06:44:48|WARN |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:1384|stack_files:STACK FILES WITHOUT CHECK ORDER!!!
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:188|~LSMTReadOnlyFile:pread times: 0, size: 0M
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/lsmt/file.cpp:188|~LSMTReadOnlyFile:pread times: 0, size: 0M
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.cpp:147|start_bk_dl_thread:no need to download
2022/11/09 06:44:48|INFO |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_file.h:49|ImageFile:new imageFile, bs: 512, size: 68719476736
2022/11/09 06:44:48|WARN |th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/image_service.cpp:190|set_result_file:no resultFile config set, ignore writing result
2022/11/09 06:44:50|ERROR|th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/src/userspace/ext2_utils.h:453|__translate_error:ext2fs unclassified error: at /home/runner/work/overlaybd-apply/overlaybd-apply/src/userspace/user.cpp:236, ecode 2133571366:to be decode
2022/11/09 06:44:50|ERROR|th=00005623F8BF5C60|/home/runner/work/overlaybd-apply/overlaybd-apply/build/_deps/overlaybd-src/src/overlaybd/untar/libtar.cpp:275|extract_all:extract failed, filename usr/lib64/python2.7/unittest/, No space left on device

Maybe cache is running out of space?

This image I'm attempting to convert is 552MB large.

yuchen0cc commented 1 year ago

@maxwolffe the "no space left" means our lsmt block device whose default size is 64 GB per layer. However it also maybe bugs in extfs. About the testing image: 1. What's the uncompressed size of the image? 2. How many layers does it have? 3. Is it a public image (or make it public) for us to debug?

northtyphoon commented 1 year ago

I ran into the same issue. You can repro it using the image in docker hub jupyter/all-spark-notebook:latest. it threw the No space left on device when extract layer 10.

yuchen0cc commented 1 year ago

@northtyphoon many thanks! we'll be managed to figure it out.

yuchen0cc commented 1 year ago

@maxwolffe @northtyphoon sorry to keep you waiting so long. We find bug in mkdir in extfs and fix it. Also there are problems when using sparse file in lsmt, so we use append file instead. Using append file maybe slower than sparse file, and wil take more space while converting. Please have a try.

We will be continue working on the problems in sparse file...

northtyphoon commented 1 year ago

Thank you @yuchen0cc

yuchen0cc commented 1 year ago

Recently, we have fixed bugs in sparse file, and made some efforts to speed up converting. @maxwolffe @northtyphoon please have a try~