containerd / accelerated-container-image

A production-ready remote container image format (overlaybd) and snapshotter based on block-device.
Apache License 2.0
409 stars 75 forks source link

Userspace conversion fails on large image (48GB) #224

Closed tianouya-db closed 10 months ago

tianouya-db commented 1 year ago

What happened in your environment?

Trying to convert a large image (48GB) in the userspace:

/opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password -i 0.1 --overlaybd 0.1_obd

The conversion always fails with an error ERRO[2451] failed to build overlaybd: failed to convert layer 17: failed to overlaybd-apply: 2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll.

Output:

INFO[0016] layer 15 uploaded
INFO[0122] downloaded layer 17
INFO[1225] downloaded layer 16
INFO[2139] layer 16 committed, uuid: c0d9066d-6668-3d07-c7b7-17aa5d37f7be, parent uuid: 9fcac1fb-2ab9-586c-7039-782bfc7790ee
INFO[2141] layer 16 converted
ERRO[2451] failed to build overlaybd: failed to convert layer 17: failed to overlaybd-apply: 2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=/var/log/overlaybd.log]
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1
2023/09/11 15:24:27|INFO |th=000055B14C24B120|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3
failed to extract
: exit status 255

The layer 17 mentioned in the error is the largest layer of the image which is ~45GB.

{
  "schemaVersion": 2,
  "mediaType": "application/vnd.docker.distribution.manifest.v2+json",
  "config": {
    "mediaType": "application/vnd.docker.container.image.v1+json",
    "size": 10904,
    "digest": "sha256:37aacfb983164414c15b13feff9dd015f502f1f498c26ba5f3f1fbfa9a27d6c3"
  },
  "layers": [
    .........
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 46814098000,
      "digest": "sha256:f4e8336fc6f74e49e99d3de5a6de2dd3e7093cb15fc705ba17f0189aaa5d204c"
    },
    {
      "mediaType": "application/vnd.docker.image.rootfs.diff.tar.gzip",
      "size": 5084181896,
      "digest": "sha256:76775a8b477ab2c7283cac2d570756525ec86a1ba3e0cb9a91fd27b30ab1ab86"
    },
  ]
}

What did you expect to happen?

Conversion should succeed.

How can we reproduce it?

I haven't found a way to repro it with a random image I built with a 45GB layer.

What is the version of your Accelerated Container Image?

0.6.7

What is your OS environment?

Ubuntu 20.04

Are you willing to submit PRs to fix it?

tianouya-db commented 1 year ago

Please let me know if there are steps I can follow to collect the relevant logs.

lihuiba commented 1 year ago

It looks that auto-expansion is needed.

BigVan commented 1 year ago

https://github.com/containerd/accelerated-container-image/blob/ddba7a33cbd9198cfe7004b9a8c13a7f85106479/cmd/convertor/builder/overlaybd_builder.go#L292C17-L292C17

func (e *overlaybdBuilderEngine) create(ctx context.Context, dir string) error {
    return utils.Create(ctx, dir, "-s", "64")
}

currently, it is a 64GB hard code for overlaybd device size, you can change it to temporarily avoid this problem

yuchen0cc commented 1 year ago

for obdconv https://github.com/containerd/accelerated-container-image/blob/ddba7a33cbd9198cfe7004b9a8c13a7f85106479/pkg/snapshot/storage.go#L654C1-L661C2

func (o *snapshotter) prepareWritableOverlaybd(ctx context.Context, snID string) error {
    // TODO(fuweid): 256GB can be configurable?
    args := []string{"64"}
    if o.writableLayerType == "sparse" {
        args = append(args, "-s")
    }
    return utils.Create(ctx, o.blockPath(snID), args...)
}

"64" (64GB) is default size, you can modify it for POC.

yuchen0cc commented 1 year ago

An option could be added to config overlaybd device size after release v1.0.0

tianouya-db commented 1 year ago

Thanks for the responses. To confirm, for userspace conversion, I can change it in overlaybd_builder.go to work around. And for obdconv, I can change pkg/snapshot/storage.go. Is that correct?

yuchen0cc commented 1 year ago

Thanks for the responses. To confirm, for userspace conversion, I can change it in overlaybd_builder.go to work around. And for obdconv, I can change pkg/snapshot/storage.go. Is that correct?

Yes, that is it.

tianouya-db commented 1 year ago

Made the changes and tried. It fixed obdconv but not userspace conversion.

Steps I did

  1. Make the changes (I updated to 256) and run make.
  2. Run mv bin/* /opt/overlaybd/snapshotter
  3. Restart systemctl restart overlaybd-snapshotter, and also systemctl restart overlaybd-tcmu.

Verify

  1. Convert with obdconv and it worked
    /opt/overlaybd/snapshotter/ctr obdconv large-image:0.1 large-image:0.1_obd
  2. Convert in userspace and it failed with the same error
    
    /opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password -i 0.1 --overlaybd 0.1_obd

INFO[0151] layer 13 committed, uuid: 245daafe-aa32-31b7-7c10-d18dcbf9895e, parent uuid: 076fb541-99fc-ce35-9782-e3b35fbd6bac INFO[0152] layer 13 converted INFO[0169] layer sha256:723afa92c821479af9243dbb6b21b69694a0e3ae66bdf874396b0be38c53028d exists INFO[0169] layer 13 uploaded INFO[2051] downloaded layer 14 ERRO[2613] failed to build overlaybd: failed to convert layer 14: failed to overlaybd-apply[native]: 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=] 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3 failed to extract : exit status 255



Let me know if I missed anything. 
liulanzheng commented 1 year ago

Made the changes and tried. It fixed obdconv but not userspace conversion.

Steps I did

  1. Make the changes (I updated to 256) and run make.
  2. Run mv bin/* /opt/overlaybd/snapshotter
  3. Restart systemctl restart overlaybd-snapshotter, and also systemctl restart overlaybd-tcmu.

Verify

  1. Convert with obdconv and it worked
/opt/overlaybd/snapshotter/ctr obdconv large-image:0.1 large-image:0.1_obd
  1. Convert in userspace and it failed with the same error
/opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password -i 0.1 --overlaybd 0.1_obd

INFO[0151] layer 13 committed, uuid: 245daafe-aa32-31b7-7c10-d18dcbf9895e, parent uuid: 076fb541-99fc-ce35-9782-e3b35fbd6bac
INFO[0152] layer 13 converted
INFO[0169] layer sha256:723afa92c821479af9243dbb6b21b69694a0e3ae66bdf874396b0be38c53028d exists
INFO[0169] layer 13 uploaded
INFO[2051] downloaded layer 14
ERRO[2613] failed to build overlaybd: failed to convert layer 14: failed to overlaybd-apply[native]: 2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=]
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1
2023/09/12 21:29:48|INFO |th=0000555F24695800|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3
failed to extract
: exit status 255

Let me know if I missed anything.

You may try pulling the latest commit, #222 . add --mkfs flag for convertor.

tianouya-db commented 1 year ago

You may try pulling the latest commit, https://github.com/containerd/accelerated-container-image/pull/222 . add --mkfs flag for convertor.

Tried it but got the same error:

/opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password --mkfs -i 0.1 --overlaybd 0.1_obd

INFO[0140] downloaded layer 13
INFO[0140] layer 13 converted
INFO[0156] layer sha256:723afa92c821479af9243dbb6b21b69694a0e3ae66bdf874396b0be38c53028d exists
INFO[0156] layer 13 uploaded
INFO[1950] downloaded layer 14
ERRO[2522] failed to build overlaybd: failed to convert layer 14: failed to overlaybd-apply[native]: 2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=]
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3
failed to extract
: exit status 255
yuchen0cc commented 1 year ago

You may try pulling the latest commit, #222 . add --mkfs flag for convertor.

Tried it but got the same error:

/opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password --mkfs -i 0.1 --overlaybd 0.1_obd

INFO[0140] downloaded layer 13
INFO[0140] layer 13 converted
INFO[0156] layer sha256:723afa92c821479af9243dbb6b21b69694a0e3ae66bdf874396b0be38c53028d exists
INFO[0156] layer 13 uploaded
INFO[1950] downloaded layer 14
ERRO[2522] failed to build overlaybd: failed to convert layer 14: failed to overlaybd-apply[native]: 2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=]
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3
failed to extract
: exit status 255

for convertor, more logs is output to /var/log/overlaybd.log. Please check this log for more information.

liulanzheng commented 1 year ago

You may try pulling the latest commit, #222 . add --mkfs flag for convertor.

Tried it but got the same error:

/opt/overlaybd/snapshotter/convertor -r my.registry.com/project/repo -u user:password --mkfs -i 0.1 --overlaybd 0.1_obd

INFO[0140] downloaded layer 13
INFO[0140] layer 13 converted
INFO[0156] layer sha256:723afa92c821479af9243dbb6b21b69694a0e3ae66bdf874396b0be38c53028d exists
INFO[0156] layer 13 uploaded
INFO[1950] downloaded layer 14
ERRO[2522] failed to build overlaybd: failed to convert layer 14: failed to overlaybd-apply[native]: 2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/signal.cpp:265|sync_signal_init:signalfd initialized
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/build/_deps/photon-src/io/epoll.cpp:289|new_epoll_engine:Init event engine: epoll
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:169|read_global_config_and_set:using config /etc/overlaybd/overlaybd.json
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:183|read_global_config_and_set:set audit_path:/var/log/overlaybd-audit.log
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:195|read_global_config_and_set:[global_conf.logConfig().logPath()=]
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:209|read_global_config_and_set:set log_level: 1
2023/09/13 05:35:15|INFO |th=0000561DA5F1A8C0|/src/src/image_service.cpp:212|read_global_config_and_set:set log_path: /var/log/overlaybd.log, log_size: 10485760, log_num: 3
failed to extract
: exit status 255

64 is still need to change to larger number. did you change it?

tianouya-db commented 1 year ago

Yes I changed to 256.

cmd/convertor/builder/overlaybd_builder.go

func (e *overlaybdBuilderEngine) create(ctx context.Context, dir string, mkfs bool) error {
        opts := []string{"-s", "256"}
        if mkfs {
                opts = append(opts, "--mkfs")
        }
        return utils.Create(ctx, dir, opts...)
}

pkg/snapshot/storage.go

func (o *snapshotter) prepareWritableOverlaybd(ctx context.Context, snID string) error {
        // TODO(fuweid): 256GB can be configurable?
        args := []string{"256"}
        if o.writableLayerType == "sparse" {
                args = append(args, "-s")
        }
        return utils.Create(ctx, o.blockPath(snID), args...)
}
yuchen0cc commented 1 year ago

@tianouya-db please pull commit #225 and retry. If it still fails, please check /var/log/overlaybd.log for more details, and let me know.

tianouya-db commented 1 year ago

@yuchen0cc it seems to work now. Thanks for the fix! Meanwhile, will there be a change to increase 64 to a larger number, e.g. 256?

yuchen0cc commented 10 months ago

An option --vsize is added to convertor to custamize overlaybd virtual block device size.