canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.28k stars 916 forks source link

Exit status 11 when copying container from remote #3649

Closed adam-koblentz closed 6 years ago

adam-koblentz commented 6 years ago

Required information

Issue description

I have 2 zesty VMs running in vmware fusion on my macbook. I am trying to copy containers from one to the other. I followed the instructions here LXD 2.0: Remote hosts and container migration

And 2 of my containers copied fine. Subsequent containers are not.

I am getting this error message:

akoblentz@minimal:~$ lxc copy serrano:actor-desktop actor-desktop
error: Migration failed on source host: exit status 11

In a previous issue, I saw someone mentioned they got around another remote issue by fully qualifying the IP in the remote conf, which I also have done.

On both machines I have done these 2 commands:

lxc config set core.https_address fully_qualified_eth0_ipv4:8443
lxc config set core.trust_password password

and added the remote to the container-less vm

lxc remote add serrano 172.16.79.3

Here is the output of lxc remote list:

+-----------------+------------------------------------------+---------------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    | PUBLIC | STATIC |
+-----------------+------------------------------------------+---------------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | YES    | NO     |
+-----------------+------------------------------------------+---------------+--------+--------+
| local (default) | unix://                                  | lxd           | NO     | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+
| serrano         | https://172.16.79.3:8443                 | lxd           | NO     | NO     |
+-----------------+------------------------------------------+---------------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | YES    | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | YES    | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+

and here is the truncated output of lxc list serrano::

akoblentz@minimal:~$ lxc list serrano:
+--------------------+---------+------+------+------------+-----------+
|        NAME        |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+--------------------+---------+------+------+------------+-----------+
| actor-container-08 | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-container-09 | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-critical     | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-desktop      | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+

My copy command: lxc copy serrano:actor-desktop actor-desktop and the output: error: Migration failed on source host: exit status 11

stgraber commented 6 years ago

Can you upgrade both your LXD systems to the latest LXD? That should at the very least give you a better error message.

apt install -t zesty-backports lxd lxd-client

That assumes you have the backports pocket enabled on those systems. That will get you LXD 2.16.

adam-koblentz commented 6 years ago

@stgraber

Here's the output after upgrading both sides:

$ lxc copy serrano:actor-desktop actor-desktop
error: Failed container creation:                           
 - https://172.16.79.3:8443: Error transferring container data: exit status 11
stgraber commented 6 years ago

Hmm, ok, so that's not that much more useful is it :)

Can you look at /var/log/lxd/lxd.log on the source and target, see if there are any errors in there that would be a bit more useful than "exit status 11"?

adam-koblentz commented 6 years ago

@stgraber

I rebooted both machines after updating to the newest version before running this test and these are the log messages from just this command:

$ lxc copy serrano:actor-desktop actor-desktop

I verified that the local (minimal) has > 35GB of disk free, and I was able to copy two other containers before I started having this error. Also, just as a test, I tried running this both as root and my normal user (in the lxd group). Same results.

Here's the contents from the remote (serrano):

lvl=eror msg="Rsync send failed: /var/lib/lxd/containers/actor-desktop/: exit status 11: rsync: write failed on \"/var/lib/lxd/containers/actor-desktop/rootfs/usr/include/readline/tilde.h\": No space left on device (28)\nrsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]\n" t=2017-08-10T14:31:57-0400

Here's the contents from the local (minimal):

ephemeral=false lvl=info msg="Creating container" name=actor-desktop t=2017-08-10T14:31:34-0400
ephemeral=false lvl=info msg="Created container" name=actor-desktop t=2017-08-10T14:31:34-0400
lvl=warn msg="Unable to update backup.yaml at this time." name=actor-desktop t=2017-08-10T14:31:34-0400
lvl=eror msg="Rsync receive failed: /var/lib/lxd/containers/actor-desktop/: exit status 11: " t=2017-08-10T14:31:57-0400
err="exit status 11" lvl=eror msg="Error during migration sink" t=2017-08-10T14:31:57-0400
created=2017-08-10T18:31:34+0000 ephemeral=false lvl=info msg="Deleting container" name=actor-desktop t=2017-08-10T14:31:57-0400 used=1970-01-01T00:00:00+0000
created=2017-08-10T18:31:34+0000 ephemeral=false lvl=info msg="Deleted container" name=actor-desktop t=2017-08-10T14:31:58-0400 used=1970-01-01T00:00:00+0000
stgraber commented 6 years ago

Can you paste "df -h" and "df -i" from serrano?

stgraber commented 6 years ago

Confusingly the "out of disk space" type errors also happen if you run out of inodes, not only if your run out of space.

adam-koblentz commented 6 years ago

My vmware config for minimal has a 60GB disk, I just verified in gparted that it's all allocated correctly. My vmware config for serrano has a 70GB disk, also verified in gparted.

Here's my local (the copying-to vm) named minimal:

$ df -h
Filesystem                    Size  Used Avail Use% Mounted on
udev                          3.9G     0  3.9G   0% /dev
tmpfs                         797M   12M  785M   2% /run
/dev/mapper/minimal--vg-root   12G   11G  724M  94% /
tmpfs                         3.9G     0  3.9G   0% /dev/shm
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                         3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                         100K     0  100K   0% /var/lib/lxd/shmounts
tmpfs                         100K     0  100K   0% /var/lib/lxd/devlxd
tmpfs                         797M     0  797M   0% /run/user/1000

$df -i
Filesystem                    Inodes  IUsed   IFree IUse% Mounted on
udev                         1013604    458 1013146    1% /dev
tmpfs                        1019275   1403 1017872    1% /run
/dev/mapper/minimal--vg-root  786432 264584  521848   34% /
tmpfs                        1019275      1 1019274    1% /dev/shm
tmpfs                        1019275      3 1019272    1% /run/lock
tmpfs                        1019275     16 1019259    1% /sys/fs/cgroup
tmpfs                        1019275      1 1019274    1% /var/lib/lxd/shmounts
tmpfs                        1019275      2 1019273    1% /var/lib/lxd/devlxd
tmpfs                        1019275      5 1019270    1% /run/user/1000

Here's my remote (copying-from vm) named serrano:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           797M   13M  784M   2% /run
/dev/sda1        69G   44G   22G  68% /
tmpfs           3.9G   12K  3.9G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           100K     0  100K   0% /var/lib/lxd/shmounts
tmpfs           100K     0  100K   0% /var/lib/lxd/devlxd
tmpfs           797M  132K  797M   1% /run/user/1000

$ df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
udev           1013570     447 1013123    1% /dev
tmpfs          1019276    1493 1017783    1% /run
/dev/sda1      4587520 1107356 3480164   25% /
tmpfs          1019276       4 1019272    1% /dev/shm
tmpfs          1019276       6 1019270    1% /run/lock
tmpfs          1019276      16 1019260    1% /sys/fs/cgroup
tmpfs          1019276       1 1019275    1% /var/lib/lxd/shmounts
tmpfs          1019276       2 1019274    1% /var/lib/lxd/devlxd
tmpfs          1019276      73 1019203    1% /run/user/1000

Also here's output from fdisk on minimal:

$sudo fdisk -l
Disk /dev/sda: 60 GiB, 64424509440 bytes, 125829120 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0e90b202

Device     Boot Start       End   Sectors Size Id Type
/dev/sda1  *     2048 125829119 125827072  60G 8e Linux LVM

Disk /dev/mapper/minimal--vg-root: 12 GiB, 12880707584 bytes, 25157632 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/minimal--vg-swap_1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
stgraber commented 6 years ago

Ok, so inode count and disk space looks good, not sure why rsync is running out of space then...

stgraber commented 6 years ago

How long does it take before you hit the error during transfer? Do you see some progress information during the transfer? LXD 2.16 should show you how much data has been transferred.

adam-koblentz commented 6 years ago

I see about 1.29GB transfers before the error happens.

stgraber commented 6 years ago

Can you run "du -sch /var/lib/lxd/containers/actor-desktop/" and then "du -sch --apparent-size /var/lib/lxd/containers/actor-desktop/"?

stgraber commented 6 years ago

Checking for potential sparse files causing problems during rsync by taking their expanded size on the target, even if only temporarily.

adam-koblentz commented 6 years ago

Sure, here's the output:

root@serrano:~# du -sch --apparent-size /var/lib/lxd/containers/actor-desktop/
3.2G    /var/lib/lxd/containers/actor-desktop/
3.2G    total
root@serrano:~# du -sch /var/lib/lxd/containers/actor-desktop/
3.4G    /var/lib/lxd/containers/actor-desktop/
3.4G    total

So it looks like it's transferring around half before dying.

adam-koblentz commented 6 years ago

The container is based on a centos7 image with a few of my company's apps installed on it, so the size isn't surprising to me.

stgraber commented 6 years ago

Yeah, that looks fine, so it's unlikely to be a sparse file issue.

stgraber commented 6 years ago

Any chance you can watch the "df -h" and "df -i" output from the target host as you try to transfer the container? See if either gets dangerously close to running out just before the transfer fails?

adam-koblentz commented 6 years ago

Okay, it looks like /dev/mapper/minimal--vg-root gets full and that's what kills it.

That hits 100% and then the copy fails.

I have my VM set to dynamically expand the drive on my real physical machine's disk as needed, which is fusion's default behavior.

stgraber commented 6 years ago

But you're copying away from this system aren't you?

adam-koblentz commented 6 years ago

Copying to this system. Both are vms in my fusion env on my mac.

I can also run watch on the remote vm as well.

stgraber commented 6 years ago

Ah, ok, so trying to copy a 3.5GB container to a system with just 724MB of free space then? yeah, that's not gonna work :)

adam-koblentz commented 6 years ago

Yeah, I'm just noticing that ubuntu's lvm didn't pick up my previous changes. I'll fix that and try again. Sorry!

stgraber commented 6 years ago

Cool, that all makes sense :) Closing this issue for now, feel free to comment if you run into other problems.