canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

Copying containers with snapshots fails on BTRFS #6428

Closed NeonHorizon closed 4 years ago

NeonHorizon commented 5 years ago

Required information

Issue description

Copying a container between hosts fails if container has a snapshot

Steps to reproduce

lxc launch ubuntu:18.04 live:test-container lxc stop live:test-container lxc copy live:test-container backup:test-container

...works...

lxc delete backup:test-container lxc snapshot live:test-container test-snapshot lxc copy live:test-container backup:test-container

Error: Failed instance creation:

Information to attach

ssh backup cat /var/snap/lxd/common/lxd/logs/lxd.log | grep 'eror'

t=2019-11-09T19:50:09+0000 lvl=eror msg="Problem with btrfs receive: It seems that you have changed your default subvolume or you specify other subvolume to\nmount btrfs, try to remount this btrfs filesystem with fs tree, and run btrfs receive again!\n" t=2019-11-09T19:50:09+0000 lvl=eror msg="Error during migration sink" err="exit status 1" t=2019-11-09T19:50:10+0000 lvl=eror msg="Error during migration sink" err="websocket: bad handshake"

stgraber commented 5 years ago

Both source and target servers are running 3.18?

NeonHorizon commented 5 years ago

Yes, they are identical. We actually have 4 servers and the fault is reproducible between any pair.

We actually noticed when trying to copy a container back from a backup server (maintained with --refresh copies) but it seems even just a basic copy shows the issue.

NeonHorizon commented 5 years ago

Interestingly enough the updates which we are doing with copy --refresh are working fine.

stgraber commented 5 years ago

That part is normal, after the initial copy, any additional transfers done through refresh use rsync rather than btrfs send/receive.

NeonHorizon commented 5 years ago

Not sure if its helpful but deleting the snapshot makes it copyable again. lxc delete live:test-container/test-snapshot lxc copy live:test-container backup:test-container ....works....

stgraber commented 4 years ago

@monstermunchkin is currently rewriting the entire btrfs storage driver so this will be fixed as part of that. We just need to make sure we have a test in place which would catch such issues in the future.

NeonHorizon commented 4 years ago

Thanks for the info Stéphane, with ZFS support recently making its way into Ubuntu I was wondering if changing filesystems was the way to go for future deployments but it sounds like BTRFS is still getting active LXD support so I'll stick with it.

stgraber commented 4 years ago

Yeah, we're still actively looking after btrfs. For Ubuntu, we certainly have had better experiences with zfs than btrfs, but we have hundreds of thousand of users on btrfs too, especially on chromebooks, so we still very much care about it.

stgraber commented 4 years ago

I've confirmed that I can transfer a container with a bunch of snapshots over the network using current master (new btrfs implementation).