balena-io / balena-cli

The official balena CLI tool.
Apache License 2.0
452 stars 138 forks source link

Preload fails with "error initializing graphdriver: driver not supported" #1099

Open mschm opened 5 years ago

mschm commented 5 years ago

First off, the preloader is an awesome tool and I really appreciate it! I have used it successfully in a Linux environment, but under macOS it fails (using the Intel NUC base image https://files.resin.io/resinos/intel-nuc/2.29.2%2Brev2.prod/image/balena.img.zip)

$ balena preload balena.img
Building Docker preloader image. [========================] 100%

/ Creating preloader container
\ Starting preloader container
| Reading image information
1: Step 1/7 : FROM docker:17.10.0-ce-dind
 ---> 9769e0f3f9cb
Step 2/7 : RUN apk update && apk add --no-cache python3 parted btrfs-progs docker util-linux sfdisk file coreutils sgdisk
 ---> Using cache
 ---> 8e051386d0ca
Step 3/7 : COPY ./requirements.txt /tmp/
 ---> Using cache
 ---> acbb5071e45c
Step 4/7 : RUN pip3 install -r /tmp/requirements.txt
 ---> Using cache
 ---> bf95343e0281
Step 5/7 : COPY ./src /usr/src/app
 ---> Using cache
 ---> 1a0ed03b49a9
Step 6/7 : WORKDIR /usr/src/app
 ---> Using cache
 ---> 334af6145047
Step 7/7 : CMD ["python3", "/usr/src/app/preload.py"]
 ---> Using cache
 ---> 510a76c6dcde
Successfully built 510a76c6dcde
Successfully tagged balena/balena-preload:latest
Waiting for Docker to start...
Exception in thread background thread for pid 209:
Traceback (most recent call last):
  File "/usr/lib/python3.6/threading.py", line 916, in _bootstrap_inner
    self.run()
  File "/usr/lib/python3.6/threading.py", line 864, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/lib/python3.6/site-packages/sh.py", line 1540, in wrap
    fn(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/sh.py", line 2459, in background_thread
    handle_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 2157, in fn
    return self.command.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/local/bin/dockerd --storage-driver=aufs --data-root=/tmp/tmp5yk6hne2/docker --host=tcp://0.0.0.0:64730

  STDOUT:

  STDERR:
time="2019-02-13T16:29:32.361054479Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2019-02-13T16:29:32.367081076Z" level=info msg="libcontainerd: new containerd process, pid: 227"
Error starting daemon: error initializing graphdriver: driver not supported

Traceback (most recent call last):
  File "/usr/src/app/preload.py", line 825, in <module>
    result = method(**data.get("parameters", {}))
  File "/usr/src/app/preload.py", line 785, in get_image_info
    images, supervisor_version = get_images_and_supervisor_version()
  File "/usr/src/app/preload.py", line 668, in get_images_and_supervisor_version
    return _get_images_and_supervisor_version(inner_image_path)
  File "/usr/src/app/preload.py", line 644, in _get_images_and_supervisor_version
    with docker_context_manager(driver, mountpoint):
  File "/usr/lib/python3.6/contextlib.py", line 81, in __enter__
    return next(self.gen)
  File "/usr/src/app/preload.py", line 511, in docker_context_manager
    running_dockerd = start_docker_daemon(storage_driver, docker_dir)
  File "/usr/src/app/preload.py", line 480, in start_docker_daemon
    running_dockerd.wait()
  File "/usr/lib/python3.6/site-packages/sh.py", line 792, in wait
    self.handle_command_exit_code(exit_code)
  File "/usr/lib/python3.6/site-packages/sh.py", line 815, in handle_command_exit_code
    raise exc
sh.ErrorReturnCode_1: 

  RAN: /usr/local/bin/dockerd --storage-driver=aufs --data-root=/tmp/tmp5yk6hne2/docker --host=tcp://0.0.0.0:64730

  STDOUT:

  STDERR:
time="2019-02-13T16:29:32.361054479Z" level=warning msg="[!] DON'T BIND ON ANY IP ADDRESS WITHOUT setting --tlsverify IF YOU DON'T KNOW WHAT YOU'RE DOING [!]"
time="2019-02-13T16:29:32.367081076Z" level=info msg="libcontainerd: new containerd process, pid: 227"
Error starting daemon: error initializing graphdriver: driver not supported

If you need help, don't hesitate in contacting us at:

  GitHub: https://github.com/balena-io/balena-cli/issues/new
  Forums: https://forums.balena.io

Thanks for your help!

Front logo Front conversations

pdcastro commented 5 years ago

This issue is caused by the removal of the AUFS storage driver (--storage-driver=aufs) in Docker Community Edition 2.0.0.0-mac78 2018-11-19 and later versions, including Docker Desktop for macOS and Windows.

Known Workarounds

Last edit: Nov 2021

Use Linux (e.g. Ubuntu) in a virtual machine like Ubuntu Multipass or VirtualBox, and install the balena CLI for Linux in the virtual machine. When doing so, do not place the OS image file (to be preloaded) in "shared folders" (network mounted filesystem) because the preload implementation uses losetup behind the scenes to mount images within images, which does not work well with a network filesystem. Instead, copy the image file to the virtual machine's own filesystem. You can of course use shared folders in order to copy files to/from the VM's own filesystem before/after preloading.

Back in 2019, another workaround was to downgrade to Docker Community Edition 18.06.1-ce-mac73 2018-08-29 (which ships with Docker 18.06.1-ce). However, those older versions of Docker are now "really old" and offer poor support for modern features like multiarch images, so many users will now prefer to use a Linux VM.

Proper Solution

Balena is working towards replacing AUFS with overlay2 in affected balenaOS images. Newer device types like the Raspberry Pi 4 already use overlay2. For other device types, the difficulty currently being addressed is the host OS update of devices in the field running old balenaOS releases that still use AUFS, and need to be migrated to overlay2 as part of the host OS update.

mschm commented 5 years ago

Thanks @pdcastro! The workaround is sufficient for me.

pdcastro commented 4 years ago

This issue has been reported for Docker for Windows as well. In the case of Docker Community Edition for Windows, the latest version to include AUFS support is 18.06.1-ce-win73 2018-08-29.

kubotaku1119 commented 4 years ago

Hi, I also faced same issue and it was works that workaround (downgrade docker version). But when I setup preload image to another .img, it was occured this error. That error does not occurred if using the newest docker version for macOS. Thank you for your awesome tools! I hope you will this information useful.

winglian commented 4 years ago

Are there any other workarounds that don't require a downgrade of Docker?

pdcastro commented 4 years ago

Are there any other workarounds that don't require a downgrade of Docker?

Unfortunately the only other option I am aware of is running Linux -- for example using a VirtualBox virtual machine. Docker for Linux still supports the AUFS driver.

roman-mazur commented 4 years ago

To clarify, this is because we still use AUFS on a bunch of balenaOS devices. Preloading OS images that have the engine with overlay2 used for containers should not result in this problem. And as soon as we migrate from aufs to overlay2 this will be resolved. In fact, we could also move preload operations to the cloud so that end-users don't need to care.

balena-ci commented 4 years ago

[roman-mazur] This issue has attached support thread https://jel.ly.fish/#/9df1e32a-b9e1-4fee-a48a-fb185ed96279

balena-ci commented 4 years ago

[jimsynz] This issue has attached support thread https://jel.ly.fish/#/5be791bd-dce4-44d0-a69f-5c51b414295f

tibbis commented 4 years ago

Do we have an update on this issue? Is the only way to downgrade docker?

pdcastro commented 4 years ago

Is the only way to downgrade docker?

The only other option I am aware of is running Linux -- for example using a VirtualBox virtual machine, or AWS instance, or local Linux server. Docker for Linux still supports the AUFS driver.

jellyfish-bot commented 4 years ago

[pdcastro] This issue has attached support thread https://jel.ly.fish/ce5dab94-a799-4087-b29f-b9cd3171cbae

chrisys commented 4 years ago

@pdcastro I've just encountered this issue here too. On latest macOS and latest Docker Desktop, the CLI preload command just hangs at Reading image information - is there a way to at least catch the error and inform the CLI process, so users aren't left waiting and wondering what's going on?

shawaj commented 3 years ago

Just an FYI for anyone looking, the 18.06.1-ce have moved location.

Now they are located in the previous versions section:

pdcastro commented 3 years ago

is there a way to at least catch the error and inform the CLI process, so users aren't left waiting and wondering what's going on?

@chrisys, @kenna-smith, @sazerzac, @erlend, @briggySmalls, (cc: @klutchell) thanks for the feedback. FYI, the following CLI releases improved on error reporting such that the error is caught and reported straight away:

The latest CLI release has the best error handling capabilities.

@shawaj, thanks for the heads up on the broken links for the old Docker releases. 👍   I'll fix earlier comments of mine.

The ultimate solution to this issue is to replace AUFS with overlay2 in affected balenaOS images, and balena is making good progress in this direction. I understand that the difficulty currently being addressed is the host OS update of devices in the field where installed balenaOS images use AUFS and the new balenaOS images use overlay2. The image will be converted as part of the host OS update.

pdcastro commented 3 years ago

I've edited this issue's title because it is not specific to macOS: It also affects Windows and, in some cases, Linux.

When the host OS (the machine where the CLI is running) is Linux, balena preload will work as long as the AUFS kernel driver is loaded. As far as I am aware (but these things can change over time), most / all Linux distros still provide Linux kernels that include the AUFS driver. If you're using Linux and coming across this issue, the following StackOverflow posts may be helpful:

jellyfish-bot commented 2 years ago

[klutchell] This issue has attached support thread https://jel.ly.fish/7254451a-33de-4e6f-94fd-abbe944cb6e8

jellyfish-bot commented 2 years ago

[klutchell] This issue has attached support thread https://jel.ly.fish/33963003-cc34-45f6-b8a5-9511b15cd082

panbanda commented 2 years ago

Just checking in on this issue. Is using docker 18.06.1-ce still the latest solution to this? Has there been any progress on the Jellyfish board?

pdcastro commented 2 years ago

@panbanda, thanks for asking. I have now updated an earlier comment to advise using a Linux VM and the balena CLI for Linux instead of downgrading Docker, because those older versions of Docker are getting just too old. Have a look at the updated comment for some additional details.

Balena is still working towards replacing AUFS with overlay2 in affected balenaOS images, and the tricky process of reliably migrating devices in the field from AUFS to overlay2 as part of host OS updates. Progress is a bit slow but it is critical that the implementation is done right, so it is not something being rushed.

maggie44 commented 2 years ago

@panbanda, thanks for asking. I have now updated an earlier comment to advise using a Linux VM and the balena CLI for Linux instead of downgrading Docker, because those older versions of Docker are getting just too old. Have a look at the updated comment for some additional details.

Balena is still working towards replacing AUFS with overlay2 in affected balenaOS images, and the tricky process of reliably migrating devices in the field from AUFS to overlay2 as part of host OS updates. Progress is a bit slow but it is critical that the implementation is done right, so it is not something being rushed.

I wonder if something recently changed in Docker. I have come across this issue to day on GitHub workflows, running on a linux ubuntu machine. Downgrading to Docker 18 isn't really an option. I also build images on one workflow that work with overlay and some that don't. If I try a solution like those seen on StackOverflow you shared, does that impact others?

Sounds like the proper fix is in the pipeline, but not that close. Would be good to get a clearer idea of a workaround to follow in the meantime.

maggie44 commented 2 years ago

Looks like GitHub runner dropped support for aufs. Not sure if we can treat it as a GitHub issue though, is very outdated it seems.

https://forums.balena.io/t/error-preloading-balenaos/349969/4

jellyfish-bot commented 2 years ago

[alexgg] This issue has attached support thread https://jel.ly.fish/c3525c4d-652f-46fb-8360-66b394880496

maggie44 commented 2 years ago

Some info here from the forums that is super helpful:

the aufs to overlay migration takes place with balenaOS 2.84.0, so it will happen as device types get updated. The engine will then migrate applications from aufs to overlay avoiding the need to re-download containers.

The affected device types are:

beaglebone-black intel-edison intel-nuc odroid-c1 raspberry-pi raspberry-pi2 raspberrypi3 raspberrypi3-64 fincm3 revpi-core-3 npe-x500-m3 up-board

pdcastro commented 2 years ago

That's great info indeed @maggie0002 @alexgg. 🎉   So I think it can be said that the ultimate "fix" to this balena preload issue is to use balenaOS 2.84.0 or later. Other than that, regarding the GitHub runner having dropped support for AUFS, I don't see what else could realistically be done because AUFS it is a Linux kernel driver that the Docker Engine relies on (my understanding).

maggie44 commented 2 years ago

That's great info indeed @maggie0002 @alexgg. 🎉   So I think it can be said that the ultimate "fix" to this balena preload issue is to use balenaOS 2.84.0 or later. Other than that, regarding the GitHub runner having dropped support for AUFS, I don't see what else could realistically be done because AUFS it is a Linux kernel driver that the Docker Engine relies on (my understanding).

2.84.0 on the boards in that list it seems. Other boards not on that list are below 2.84.0 but seem to work ok. So if using one of the boards on the list, then will need to wait for 2.84.0.

I had a brief play around with the things from the stack overflow posts you shared. Most of the packages it suggested installing are deprecated and no longer available in the Ubuntu repositories so didn’t get far.

panbanda commented 2 years ago

This is great news! Is there a public kanban or something to view the progress / timelines of the different boards? I'm really only needing the rpi0 but I'm sure others would be interested as well.

maggie44 commented 2 years ago

This is great news! Is there a public kanban or something to view the progress / timelines of the different boards? I'm really only needing the rpi0 but I'm sure others would be interested as well.

Or at least a tracking issue? Is this issue or the forum one (https://forums.balena.io/t/error-preloading-balenaos/349969) going to be updated when each of the new OS versions are released? I'm assuming it will be relatively soon as there isn't really support for those devices and preload in the meantime?

pdcastro commented 2 years ago

What I got from the OS team is that: "The latest beaglebone-black release already performs the migration [AUFS → overlay2]. We are waiting on a decent uptake of BBB fleets to perform the migration [i.e. a large-ish number of devices in the field to have migrated] so that we catch any pending problem before updating bigger fleets." In other words, while our extensive internal testing has given us the green light for all device types, we are treading carefully by making the migration available incrementally. The upside of the delay is a gain in reliability.

To easily find out what the latest balenaOS version for a device type is, the balena os versions <dt> command can be used. For several device types, a shell loop as follows could be used:

$ DEVICE_TYPES=(beaglebone-black intel-edison intel-nuc odroid-c1 raspberry-pi raspberry-pi2 raspberrypi3 raspberrypi3-64 fincm3 revpi-core-3 npe-x500-m3 up-board)

$ for dt in "${DEVICE_TYPES[@]}"; do echo "$(balena os versions "${dt}" | head -n 1) ${dt}"; done
v2.85.16+rev1.prod (recommended) beaglebone-black
v2.31.5+rev1.prod (recommended) intel-edison
v2.83.18+rev1.prod (recommended) intel-nuc
v2.38.0+rev1.prod (recommended) odroid-c1
v2.83.21+rev1.prod (recommended) raspberry-pi
v2.83.21+rev1.prod (recommended) raspberry-pi2
v2.83.21+rev1.prod (recommended) raspberrypi3
v2.80.3+rev1.prod (recommended) raspberrypi3-64
v2.83.21+rev1.prod (recommended) fincm3
v2.80.3+rev1.prod (recommended) revpi-core-3
v2.58.3+rev1.prod (recommended) npe-x500-m3
v2.68.1+rev1.prod (recommended) up-board

@maggie0002, regarding "there isn't really support for those devices and preload in the meantime", do you mean when using GitHub workflows specifically? The old workaround of using a Linux Virtual Machine (e.g. with Ubuntu) to run balena preload still works. As far as I am aware, most Linux distributions still ship with Linux kernels that support AUFS. If the latest Linux distro versions / kernels are dropping support for AUFS, we need to identify the latest versions that still support AUFS.

maggie44 commented 2 years ago

What I got from the OS team is that: "The latest beaglebone-black release already performs the migration [AUFS → overlay2]. We are waiting on a decent uptake of BBB fleets to perform the migration [i.e. a large-ish number of devices in the field to have migrated] so that we catch any pending problem before updating bigger fleets." In other words, while our extensive internal testing has given us the green light for all device types, we are treading carefully by making the migration available incrementally. The upside of the delay is a gain in reliability.

To easily find out what the latest balenaOS version for a device type is, the balena os versions <dt> command can be used. For several device types, a shell loop as follows could be used:

$ DEVICE_TYPES=(beaglebone-black intel-edison intel-nuc odroid-c1 raspberry-pi raspberry-pi2 raspberrypi3 raspberrypi3-64 fincm3 revpi-core-3 npe-x500-m3 up-board)

$ for dt in "${DEVICE_TYPES[@]}"; do echo "$(balena os versions "${dt}" | head -n 1) ${dt}"; done
v2.85.16+rev1.prod (recommended) beaglebone-black
v2.31.5+rev1.prod (recommended) intel-edison
v2.83.18+rev1.prod (recommended) intel-nuc
v2.38.0+rev1.prod (recommended) odroid-c1
v2.83.21+rev1.prod (recommended) raspberry-pi
v2.83.21+rev1.prod (recommended) raspberry-pi2
v2.83.21+rev1.prod (recommended) raspberrypi3
v2.80.3+rev1.prod (recommended) raspberrypi3-64
v2.83.21+rev1.prod (recommended) fincm3
v2.80.3+rev1.prod (recommended) revpi-core-3
v2.58.3+rev1.prod (recommended) npe-x500-m3
v2.68.1+rev1.prod (recommended) up-board

@maggie0002, regarding "there isn't really support for those devices and preload in the meantime", do you mean when using GitHub workflows specifically? The old workaround of using a Linux Virtual Machine (e.g. with Ubuntu) to run balena preload still works. As far as I am aware, most Linux distributions still ship with Linux kernels that support AUFS. If the latest Linux distro versions / kernels are dropping support for AUFS, we need to identify the latest versions that still support AUFS.

I was seeing GitHub Workflows as largely the same thing. It’s a Ubuntu virtual machine for running processes. It seemed like the most recent breaking change was to do with Docker dropping support for AUFs. So a virtual machine would have to roll back to an old Docker for it to still work. Somewhat viable, but the workarounds are narrowing quite quickly. There is some more info here: https://forums.balena.io/t/error-preloading-balenaos/349969/4?u=maggie

Thanks for the idea on the script to get the latest images. I’m not entirely sure it’s practical for me to run it everyday, a tracking issue would be better if possible. At the moment I have just decided to drop support for those devices and fully appreciate the stability steps being taken that can cause a delay. It would be helpful though to know when I can restore support for those devices without having to check everyday.

I understand there is the potential to set up a custom vm with a rolled back Docker and plug that in as a runner for GitHub or use it manually, or maybe explore downgrading the GitHub runner Docker, or maybe the Ubuntu repository default has an old Docker that could work and could setup some sort of manual workflow. But that’s an awful lot of work I’m going to try and avoid. I would rather just drop the support for it and then know as soon as I can offer it again, assuming it isn’t a long way in the future.

pdcastro commented 2 years ago

I understand that the latest stable version of Ubuntu (20.04.3) still supports AUFS. The following test has confirmed it, and it is also stated on Docker's Installation Instructions for Ubuntu:

Supported storage drivers Docker Engine on Ubuntu supports overlay2, aufs and btrfs storage drivers.

I like the convenience of using Multipass to create text mode (non-graphical) Ubuntu virtual machines. The following commands achieve the preloading of a Raspberry Pi 3 image (using AUFS), all without leaving a command line prompt and therefore potentially scriptable:

# https://multipass.run/
# The VirtualBox driver may not be required, but it happens to be what I tested with.
$ sudo multipass set local.driver=virtualbox
$ multipass launch -n ubuntu20 -m 5G -d 20G -c 4 20.04

# copy image to be preloaded to the VM
$ multipass transfer balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img ubuntu20:

$ multipass exec ubuntu20 bash
$ cat /etc/issue
Ubuntu 20.04.3 LTS \n \l

# Install Docker in the VM (https://docs.docker.com/engine/install/ubuntu/)
$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh
$ sudo usermod -aG docker $USER
$ newgrp docker
$ docker version
... Version: 20.10.10

# Install the balena CLI in the VM
$ curl -LO https://github.com/balena-io/balena-cli/releases/download/v12.51.3/balena-cli-v12.51.3-linux-x64-standalone.zip
$ sudo apt install unzip
$ unzip balena-cli-v12.51.3-linux-x64-standalone.zip
$ alias balena="${HOME}/balena-cli/balena"
$ balena login
Logging in to balena-cloud.com
? How would you like to login? Authentication token ...

# preload the image
$ balena preload -f test-rpi -c current balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img

# Copy the preloaded image back to the host OS
$ exit
$ multipass transfer ubuntu20:balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img .

I tested the commands above on a Mac laptop.

maggie44 commented 2 years ago

I understand that the latest stable version of Ubuntu (20.04.3) still supports AUFS. The following test has confirmed it, and it is also stated on Docker's Installation Instructions for Ubuntu:

Supported storage drivers Docker Engine on Ubuntu supports overlay2, aufs and btrfs storage drivers.

I like the convenience of using Multipass to create text mode (non-graphical) Ubuntu virtual machines. The following commands achieve the preloading of a Raspberry Pi 3 image (using AUFS), all without leaving a command line prompt and therefore potentially scriptable:

# https://multipass.run/
# The VirtualBox driver may not be required, but it happens to be what I tested with.
$ sudo multipass set local.driver=virtualbox
$ multipass launch -n ubuntu20 -m 5G -d 20G -c 4 20.04

# copy image to be preloaded to the VM
$ multipass transfer balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img ubuntu20:

$ multipass exec ubuntu20 bash
$ cat /etc/issue
Ubuntu 20.04.3 LTS \n \l

# Install Docker in the VM (https://docs.docker.com/engine/install/ubuntu/)
$ curl -fsSL https://get.docker.com -o get-docker.sh
$ sudo sh get-docker.sh
$ sudo usermod -aG docker $USER
$ newgrp docker
$ docker version
... Version: 20.10.10

# Install the balena CLI in the VM
$ curl -LO https://github.com/balena-io/balena-cli/releases/download/v12.51.3/balena-cli-v12.51.3-linux-x64-standalone.zip
$ sudo apt install unzip
$ unzip balena-cli-v12.51.3-linux-x64-standalone.zip
$ alias balena="${HOME}/balena-cli/balena"
$ balena login
Logging in to balena-cloud.com
? How would you like to login? Authentication token ...

# preload the image
$ balena preload -f test-rpi -c current balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img

# Copy the preloaded image back to the host OS
$ exit
$ multipass transfer ubuntu20:balena-cloud-test-rpi-raspberrypi3-2.83.21+rev1-v12.10.3.img .

I tested the commands above on a Mac laptop.

Another approach here from a user, adding for reference. I haven't tried it though, personally still assuming the OS updates won't be too far away and better to wait than redo the workflows: https://forums.balena.io/t/balena-aufs-preload-on-github-actions/350526

jellyfish-bot commented 2 years ago

[fisehara] This issue has attached support thread https://jel.ly.fish/c4d4a45d-974c-4226-b607-91bd711b7f0b