adam-koblentz commented 6 years ago

Required information

Distribution: Ubuntu
Distribution version: Zesty

The output of "lxc info" or if that fails:

config:
core.https_address: 172.16.79.4:8443
core.trust_password: true
api_extensions:
- storage_zfs_remove_snapshots
- container_host_shutdown_timeout
- container_syscall_filtering
- auth_pki
- container_last_used_at
- etag
- patch
- usb_devices
- https_allowed_credentials
- image_compression_algorithm
- directory_manipulation
- container_cpu_time
- storage_zfs_use_refquota
- storage_lvm_mount_options
- network
- profile_usedby
- container_push
- container_exec_recording
- certificate_update
- container_exec_signal_handling
- gpu_devices
- container_image_properties
- migration_progress
- id_map
- network_firewall_filtering
- network_routes
- storage
- file_delete
- file_append
- network_dhcp_expiry
- storage_lvm_vg_rename
- storage_lvm_thinpool_rename
- network_vlan
- image_create_aliases
- container_stateless_copy
api_status: stable
api_version: "1.0"
auth: trusted
public: false
environment:
addresses:
- 172.16.79.4:8443
architectures:
- x86_64
- i686
certificate: |
-----BEGIN CERTIFICATE-----
MIIFQzCCAyugAwIBAgIQEeVR5oeeN2qsfmNJEhzwQTANBgkqhkiG9w0BAQsFADA1
MRwwGgYDVQQKExNsaW51eGNvbnRhaW5lcnMub3JnMRUwEwYDVQQDDAxyb290QG1p
bmltYWwwHhcNMTcwODEwMTI1MTEwWhcNMjcwODA4MTI1MTEwWjA1MRwwGgYDVQQK
ExNsaW51eGNvbnRhaW5lcnMub3JnMRUwEwYDVQQDDAxyb290QG1pbmltYWwwggIi
MA0GCSqGSIb3DQEBAQUAA4ICDwAwggIKAoICAQC3SE33lf6W93rIViLjJlm2IvEY
FWTN3gL/eHJxqSnrZ/GHJmqvJvB4+6uoRYMncWi9pQwcw6rrs5I5WKpnFCgNVj7v
8wNX6O5AmJJD18IOM877onSIBCLBt4flcpYq8j4jNZSiOPEhPcULDGOkGPxaA5py
Q5hItjirgYwttS1otu0jXC2Po6IQKq/KNTqX4wm+wWCN0rK+SVWeVnjYy52BKSVE
7reaqPvhQywDaAX+TvyhjDcNpIBB1IPVkLhma/3aInFxaryeypYGk80t/14bDFac
u5M+4y3mqTP+zPoNbxq0YtbplEh1feuimvVKW6VsD7gFMFoGiFljAqmn5RcjPfCA
KwbXqQZOxJOhOJwY/vZddf3FoIxQV81BtgzfU1711FkXqpuLdT6qgqkU1wt0mPNy
LF6m3usDalFgMI8PzaP6IM9/pVeJNn0qfoqeop4uSIv9DoBdcFARwbJQaZQ/Pknp
aEt4WsovugmfSrtlKZgwaRkiw0MkwaDzG0Ls3fTr11mH5VPu3ytF51lYmQkj7dhS
qi5C+CtEvaIoj5VZvsjhyG7yqyCCHcXOAYfoFdzIJhly3S1wsXlJxKxVRYQvGZok
vLPKmCuilXAq1BXwxettb7RhKkOwaWxAEtAR+7wA5Uc/Iplk65XAehZq8gnY8cIh
slolNIbqXzN0ippkuwIDAQABo08wTTAOBgNVHQ8BAf8EBAMCBaAwEwYDVR0lBAww
CgYIKwYBBQUHAwEwDAYDVR0TAQH/BAIwADAYBgNVHREEETAPggdtaW5pbWFshwSs
EE8EMA0GCSqGSIb3DQEBCwUAA4ICAQCmjBDgj11SlmVQ8h/p8HHkgMfo9hbYt9g0
5xx/bbxhEs2U1ebXJfx3jZ1RRo6aXSolG/Do0o+EXDiebbYyAxw23/806JNOf5je
QItG3wk2YDS3pg0+iMmJJ3d7/hdh7xyL8xtcFrMoySk3GEd1zhyFri3zcUO5l+LT
LAzFAGiNCqo9vjst5oL+oW0fbM9085vFnK2gzi25QXQ9vFhp9jRmG7cV3Z0/NKsQ
M8lvCZi87syg68b+YqW1evC3Ee9qAYRP0PgROuekYyFDUytfL9s0gQnt7iNr5pOj
4xM/rbrBVw9HOhvALalDeiA6LlfxgfMrsxEdx4niG9okpZqbEzOooNgZgEI7e6x5
l/fYaP+f26QVE3mS9xrc0Gm2IMNU549a4f8bCqQ01zIm5FMVcsJSx3h71djDOjdB
VrdVjz701Q0jGtj8PXpckyBTXO2RHdpnCtwPwyY8rJHYA214/ECSOndmLZ8Royc8
Rklonwff+S+8X1YGf1D8w5jqaVjGx4e6UUqdB7ZM3BLaaTmnFBGQRTxiKxRYkhfB
9Sbj4YlaM1z0ozYYmdMTjaBMf6SO4Nn0Pvb1kMjngpS63NlUilT80CvS3EyGjefj
bAKp/iSSM0hqyK1XeNWA80h+G0zlkVtxMbMYnU+NvEy9UY9766TWWhQXFHiZbixE
8tNVa87qRg==
-----END CERTIFICATE-----
certificate_fingerprint: 08a693084a400d82cb4e09255d34efc686909adbff48beed95b6865650f63ea2
driver: lxc
driver_version: 2.0.8
kernel: Linux
kernel_architecture: x86_64
kernel_version: 4.10.0-30-generic
server: lxd
server_pid: 1278
server_version: "2.12"
storage: dir
storage_version: "1"

Issue description

I have 2 zesty VMs running in vmware fusion on my macbook. I am trying to copy containers from one to the other. I followed the instructions here LXD 2.0: Remote hosts and container migration

And 2 of my containers copied fine. Subsequent containers are not.

I am getting this error message:

akoblentz@minimal:~$ lxc copy serrano:actor-desktop actor-desktop
error: Migration failed on source host: exit status 11

In a previous issue, I saw someone mentioned they got around another remote issue by fully qualifying the IP in the remote conf, which I also have done.

On both machines I have done these 2 commands:

lxc config set core.https_address fully_qualified_eth0_ipv4:8443
lxc config set core.trust_password password

and added the remote to the container-less vm

lxc remote add serrano 172.16.79.3

Here is the output of lxc remote list:

+-----------------+------------------------------------------+---------------+--------+--------+
|      NAME       |                   URL                    |   PROTOCOL    | PUBLIC | STATIC |
+-----------------+------------------------------------------+---------------+--------+--------+
| images          | https://images.linuxcontainers.org       | simplestreams | YES    | NO     |
+-----------------+------------------------------------------+---------------+--------+--------+
| local (default) | unix://                                  | lxd           | NO     | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+
| serrano         | https://172.16.79.3:8443                 | lxd           | NO     | NO     |
+-----------------+------------------------------------------+---------------+--------+--------+
| ubuntu          | https://cloud-images.ubuntu.com/releases | simplestreams | YES    | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+
| ubuntu-daily    | https://cloud-images.ubuntu.com/daily    | simplestreams | YES    | YES    |
+-----------------+------------------------------------------+---------------+--------+--------+

and here is the truncated output of lxc list serrano::

akoblentz@minimal:~$ lxc list serrano:
+--------------------+---------+------+------+------------+-----------+
|        NAME        |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+--------------------+---------+------+------+------------+-----------+
| actor-container-08 | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-container-09 | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-critical     | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+
| actor-desktop      | STOPPED |      |      | PERSISTENT | 0         |
+--------------------+---------+------+------+------------+-----------+

My copy command: lxc copy serrano:actor-desktop actor-desktop and the output: error: Migration failed on source host: exit status 11

stgraber commented 6 years ago

Can you upgrade both your LXD systems to the latest LXD? That should at the very least give you a better error message.

apt install -t zesty-backports lxd lxd-client

That assumes you have the backports pocket enabled on those systems. That will get you LXD 2.16.

adam-koblentz commented 6 years ago

@stgraber

Here's the output after upgrading both sides:

$ lxc copy serrano:actor-desktop actor-desktop
error: Failed container creation:                           
 - https://172.16.79.3:8443: Error transferring container data: exit status 11

stgraber commented 6 years ago

Hmm, ok, so that's not that much more useful is it :)

Can you look at /var/log/lxd/lxd.log on the source and target, see if there are any errors in there that would be a bit more useful than "exit status 11"?

adam-koblentz commented 6 years ago

@stgraber

I rebooted both machines after updating to the newest version before running this test and these are the log messages from just this command:

$ lxc copy serrano:actor-desktop actor-desktop

I verified that the local (minimal) has > 35GB of disk free, and I was able to copy two other containers before I started having this error. Also, just as a test, I tried running this both as root and my normal user (in the lxd group). Same results.

Here's the contents from the remote (serrano):

lvl=eror msg="Rsync send failed: /var/lib/lxd/containers/actor-desktop/: exit status 11: rsync: write failed on \"/var/lib/lxd/containers/actor-desktop/rootfs/usr/include/readline/tilde.h\": No space left on device (28)\nrsync error: error in file IO (code 11) at receiver.c(393) [receiver=3.1.2]\n" t=2017-08-10T14:31:57-0400

Here's the contents from the local (minimal):

ephemeral=false lvl=info msg="Creating container" name=actor-desktop t=2017-08-10T14:31:34-0400
ephemeral=false lvl=info msg="Created container" name=actor-desktop t=2017-08-10T14:31:34-0400
lvl=warn msg="Unable to update backup.yaml at this time." name=actor-desktop t=2017-08-10T14:31:34-0400
lvl=eror msg="Rsync receive failed: /var/lib/lxd/containers/actor-desktop/: exit status 11: " t=2017-08-10T14:31:57-0400
err="exit status 11" lvl=eror msg="Error during migration sink" t=2017-08-10T14:31:57-0400
created=2017-08-10T18:31:34+0000 ephemeral=false lvl=info msg="Deleting container" name=actor-desktop t=2017-08-10T14:31:57-0400 used=1970-01-01T00:00:00+0000
created=2017-08-10T18:31:34+0000 ephemeral=false lvl=info msg="Deleted container" name=actor-desktop t=2017-08-10T14:31:58-0400 used=1970-01-01T00:00:00+0000

stgraber commented 6 years ago

Can you paste "df -h" and "df -i" from serrano?

stgraber commented 6 years ago

Confusingly the "out of disk space" type errors also happen if you run out of inodes, not only if your run out of space.

adam-koblentz commented 6 years ago

My vmware config for minimal has a 60GB disk, I just verified in gparted that it's all allocated correctly. My vmware config for serrano has a 70GB disk, also verified in gparted.

Here's my local (the copying-to vm) named minimal:

$ df -h
Filesystem                    Size  Used Avail Use% Mounted on
udev                          3.9G     0  3.9G   0% /dev
tmpfs                         797M   12M  785M   2% /run
/dev/mapper/minimal--vg-root   12G   11G  724M  94% /
tmpfs                         3.9G     0  3.9G   0% /dev/shm
tmpfs                         5.0M     0  5.0M   0% /run/lock
tmpfs                         3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                         100K     0  100K   0% /var/lib/lxd/shmounts
tmpfs                         100K     0  100K   0% /var/lib/lxd/devlxd
tmpfs                         797M     0  797M   0% /run/user/1000

$df -i
Filesystem                    Inodes  IUsed   IFree IUse% Mounted on
udev                         1013604    458 1013146    1% /dev
tmpfs                        1019275   1403 1017872    1% /run
/dev/mapper/minimal--vg-root  786432 264584  521848   34% /
tmpfs                        1019275      1 1019274    1% /dev/shm
tmpfs                        1019275      3 1019272    1% /run/lock
tmpfs                        1019275     16 1019259    1% /sys/fs/cgroup
tmpfs                        1019275      1 1019274    1% /var/lib/lxd/shmounts
tmpfs                        1019275      2 1019273    1% /var/lib/lxd/devlxd
tmpfs                        1019275      5 1019270    1% /run/user/1000

Here's my remote (copying-from vm) named serrano:

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
udev            3.9G     0  3.9G   0% /dev
tmpfs           797M   13M  784M   2% /run
/dev/sda1        69G   44G   22G  68% /
tmpfs           3.9G   12K  3.9G   1% /dev/shm
tmpfs           5.0M  4.0K  5.0M   1% /run/lock
tmpfs           3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs           100K     0  100K   0% /var/lib/lxd/shmounts
tmpfs           100K     0  100K   0% /var/lib/lxd/devlxd
tmpfs           797M  132K  797M   1% /run/user/1000

$ df -i
Filesystem      Inodes   IUsed   IFree IUse% Mounted on
udev           1013570     447 1013123    1% /dev
tmpfs          1019276    1493 1017783    1% /run
/dev/sda1      4587520 1107356 3480164   25% /
tmpfs          1019276       4 1019272    1% /dev/shm
tmpfs          1019276       6 1019270    1% /run/lock
tmpfs          1019276      16 1019260    1% /sys/fs/cgroup
tmpfs          1019276       1 1019275    1% /var/lib/lxd/shmounts
tmpfs          1019276       2 1019274    1% /var/lib/lxd/devlxd
tmpfs          1019276      73 1019203    1% /run/user/1000

Also here's output from fdisk on minimal:

$sudo fdisk -l
Disk /dev/sda: 60 GiB, 64424509440 bytes, 125829120 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: dos
Disk identifier: 0x0e90b202

Device     Boot Start       End   Sectors Size Id Type
/dev/sda1  *     2048 125829119 125827072  60G 8e Linux LVM

Disk /dev/mapper/minimal--vg-root: 12 GiB, 12880707584 bytes, 25157632 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

Disk /dev/mapper/minimal--vg-swap_1: 8 GiB, 8589934592 bytes, 16777216 sectors
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes

stgraber commented 6 years ago

Ok, so inode count and disk space looks good, not sure why rsync is running out of space then...

stgraber commented 6 years ago

How long does it take before you hit the error during transfer? Do you see some progress information during the transfer? LXD 2.16 should show you how much data has been transferred.

adam-koblentz commented 6 years ago

I see about 1.29GB transfers before the error happens.

stgraber commented 6 years ago

Can you run "du -sch /var/lib/lxd/containers/actor-desktop/" and then "du -sch --apparent-size /var/lib/lxd/containers/actor-desktop/"?

stgraber commented 6 years ago

Checking for potential sparse files causing problems during rsync by taking their expanded size on the target, even if only temporarily.

adam-koblentz commented 6 years ago

Sure, here's the output:

root@serrano:~# du -sch --apparent-size /var/lib/lxd/containers/actor-desktop/
3.2G    /var/lib/lxd/containers/actor-desktop/
3.2G    total
root@serrano:~# du -sch /var/lib/lxd/containers/actor-desktop/
3.4G    /var/lib/lxd/containers/actor-desktop/
3.4G    total

So it looks like it's transferring around half before dying.

adam-koblentz commented 6 years ago

The container is based on a centos7 image with a few of my company's apps installed on it, so the size isn't surprising to me.

stgraber commented 6 years ago

Yeah, that looks fine, so it's unlikely to be a sparse file issue.

stgraber commented 6 years ago

Any chance you can watch the "df -h" and "df -i" output from the target host as you try to transfer the container? See if either gets dangerously close to running out just before the transfer fails?

adam-koblentz commented 6 years ago

Okay, it looks like /dev/mapper/minimal--vg-root gets full and that's what kills it.

That hits 100% and then the copy fails.

I have my VM set to dynamically expand the drive on my real physical machine's disk as needed, which is fusion's default behavior.

stgraber commented 6 years ago

But you're copying away from this system aren't you?

adam-koblentz commented 6 years ago

Copying to this system. Both are vms in my fusion env on my mac.

I can also run watch on the remote vm as well.

stgraber commented 6 years ago

Ah, ok, so trying to copy a 3.5GB container to a system with just 724MB of free space then? yeah, that's not gonna work :)

adam-koblentz commented 6 years ago

Yeah, I'm just noticing that ubuntu's lvm didn't pick up my previous changes. I'll fix that and try again. Sorry!

stgraber commented 6 years ago

Cool, that all makes sense :) Closing this issue for now, feel free to comment if you run into other problems.

canonical / lxd

Exit status 11 when copying container from remote #3649

Required information

Issue description