Closed bengivre closed 7 years ago
Hi, can you please show the output of:
Did you use a custom storage pool or did you use /var/lib/lxd/zfs.img
?
I am having the exact same issue
sqlite3 /var/lib/lxd/lxd.db "SELECT name FROM storage_pools"
zpool
I am using a custom pool
I did check the DB it was looking ok, here is the result :
sqlite3 /var/lib/lxd/lxd.db "SELECT name FROM storage_pools" lxd-zpool
Using a custom zfs zpool , not an img
Hm, I performed an upgrade from a custom storage pool right now without problems. Can you please show me the contents of your zpools?:
zfs list -r <pool_name> -t all
and the layout of /var/lib/lxd
, specifically:
I want to see whether the upgrade actually was performed or whether it already stopped at the database entry creation stage.
here is the details :
For a better indentation I putted here : http://pastebin.com/3TFTPagE
Something that might be useful to say, it was a server originally in pure LXC , converted to an LXD server a long time ago and LXD was working well. Thanks for your help
So what do you see when you start the daemon in the foreground:
/usr/bin/lxd --debug --group lxd
This looks like the upgrade worked but the api patch is not marked as applied.
@brauner /var/lib/lxd/images doesn't look fully migrated to me. There's a leftover .zfs in there.
@givre I don't think it's really an issue, but you can get rid of a couple of remnants from a failed image import with: "rm -Rf /var/lib/lxd/images/lxdbuild*"
`/usr/bin/lxd --debug --group lxd
`
I did "rm -Rf /var/lib/lxd/images/lxdbuild*" but LXD still does not start
Maybe I can backup lxd.db , remove lxd/lxc and install it again ? Or you wanna try to find what happened here?
No, we should be able to fix this for you. I think it matches what I ran into here last night.
https://github.com/lxc/lxd/issues/2907 is the report I filed for what happened to one of my machines.
In your case, it looks like:
lxd-zpool/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb 576M 274G 576M /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
Is the problem.
To fix things for you, I think you should run:
systemctl stop lxd
umount -l /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
touch /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d.zfs
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
touch /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606.zfs
touch /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681.zfs
touch /var/lib/lxd/images/ca9a673ed503cff575af61e0f1f8c0ffa127539c01e454b01ac64ef1e8a6ed7f.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db
lxd --debug --group lxd
The goal of those commands is to re-create all the files that the migration tool expects. That's two files for each container under /var/lib/lxd/containers and up to three files per image in /var/lib/lxd/images/. We then restore the old database and re-run LXD this time with debug logging which should run the migration again and hopefully not fail this time.
hi @stgraber
Thanks for the commands matching my system.
Here is the result I got from the last command :
We already see that error:
DBUG[02-20|02:34:48] No existing storage pools detected.
Here is what i have is try again to start it :
Gah, crap, that image isn't referenced in LXD and should have been removed from disk instead...
Ok, lets try this again:
systemctl stop lxd
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
rm -f /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d*
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
rm -f /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606*
rm -f /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681*
touch /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db
lxd --debug --group lxd
still the same error :/
error: Failed to set new ZFS mountpoint: cannot open 'lxd-zpool/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681': dataset does not exist
I can see it in the DB, maybe I can just remove it ?
Ok, so , I tried to remove the references in the DB of that image in tables:
It started . Maybe you wanna try something else ?
Ok.
What do you mean with "It started"? LXD started properly this time? If so, then just running "systemctl start lxd" should get the daemon back online and working.
Oups, it started correctly , but that was not a good idea , because the containers linked to that image can't start .
`service lxd status ● lxd.service - LXD - main daemon Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled) Active: active (running) since lun. 2017-02-20 03:06:02 CET; 49s ago Docs: man:lxd(1) Process: 5780 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS) Process: 5767 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS) Main PID: 5779 (lxd) Tasks: 22 Memory: 17.1M CPU: 498ms CGroup: /system.slice/lxd.service ├─5109 /usr/lib/x86_64-linux-gnu/lxc/lxc-monitord /var/lib/lxd/containers 6 ├─5111 [lxc monitor] /var/lib/lxd/containers juju-16cbed-0 ├─5211 [lxc monitor] /var/lib/lxd/containers juju-16cbed-0 ├─5779 /usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log └─5865 dnsmasq --strict-order --bind-interfaces --pid-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --listen-address=10.250.179.1 --dhcp-no-override --dhcp-authoritative --dh
févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: Lecture de /etc/resolv.conf févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: utilise les adresses locales seulement pour domaine lxd févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: utilise le serveur de nom 10.101.90.82#53 févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: lecture /etc/hosts - 5 adresses févr. 20 03:06:02 lxd-virt-01a dnsmasq-dhcp[5865]: Lecture de /var/lib/lxd/networks/lxdbr0/dnsmasq.hosts févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: lecture /etc/hosts - 5 adresses févr. 20 03:06:02 lxd-virt-01a dnsmasq-dhcp[5865]: Lecture de /var/lib/lxd/networks/lxdbr0/dnsmasq.hosts févr. 20 03:06:02 lxd-virt-01a systemd[1]: Started LXD - main daemon. févr. 20 03:06:22 lxd-virt-01a lxd[5779]: lvl=warn msg="Unable to refresh cache, using stale entry" server=https://cloud-images.ubuntu.com/releases/ t=2017-02-20T03:06:22+0100 févr. 20 03:06:42 lxd-virt-01a lxd[5779]: lvl=warn msg="Unable to refresh cache, using stale entry" server=https://cloud-images.ubuntu.com/releases t=2017-02-20T03:06:42+0100 `
But
lxc start dhcp-lan error: No storage pool specified.
What surprised me is I never deleted any images from the file system itself
Or , I understand why. I migrated /var/lib/lxd a while ago , from BTRFS to ZFS . The missing image is here :
Ok, so LXD appears to be running and just be a bit confused because of missing disk entries in the profiles.
Can you post the output of "lxc list"?
lxc list +----------------+---------+------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +----------------+---------+------+------+------------+-----------+ | blockcdn-01 | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | consul-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | dhcp-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | haproxy01-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | juju-16cbed-0 | RUNNING | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-17 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-18 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-19 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-20 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | odl-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+
Can you post "lxc profile list"?
lxc profile list
+-------------------+---------+
| NAME | USED BY |
+-------------------+---------+
| default | 5 |
+-------------------+---------+
| docker | 0 |
+-------------------+---------+
| juju-controller | 1 |
+-------------------+---------+
| juju-default | 0 |
+-------------------+---------+
| juju-web4allcloud | 4 |
+-------------------+---------+
| lan-only | 3 |
+-------------------+---------+
| lan-wan | 2 |
+-------------------+---------+
| migratable | 0 |
+-------------------+---------+
Ok, so I take it the broken containers are using lan-only or lan-wan but not the default profile.
This may fix things:
lxc profile device add lan-only root disk path=/ pool=default
lxc profile device add lan-wan root disk path=/ pool=default
lxc list
+----------------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------+---------+------+------+------------+-----------+
| blockcdn-01 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| consul-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| dhcp-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| haproxy01-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-16cbed-0 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-17 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-18 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-19 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-20 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| odl-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
Looks much better :) So what I did :
systemctl stop lxd
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
rm -f /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d*
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
rm -f /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606*
rm -f /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681*
touch /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db
Then, I did create the missing dataset in ZFS manually
zfs create lxd-zpool/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681
Then , started lxd in debug mode
lxd --debug --group lxd
Like that I did not touch the DB
And finally , had to change defaut by the name of my zpool
lxc start dhcp-lan
error: Container is supposed to exist on storage pool "default", but it actually exists on "lxd-zpool".
Try `lxc info --show-log dhcp-lan` for more info
So I did
lxc profile device add lan-only root disk path=/ pool=lxd-zpool
lxc profile device add lan-wan root disk path=/ pool=lxd-zpool
And now, everything just works fine !
lxc list
+----------------+---------+------+------+------------+-----------+
| NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS |
+----------------+---------+------+------+------------+-----------+
| blockcdn-01 | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| consul-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| dhcp-lan | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| haproxy01-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-16cbed-0 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-17 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-18 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-19 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-20 | RUNNING | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
| odl-lan | STOPPED | | | PERSISTENT | 0 |
+----------------+---------+------+------+------------+-----------+
Thanks a lot for your support !
Cool, glad to hear you're back online.
I'm going to close this issue as we have duplicates of it for the problems reported here.
Hi,
I just did the upgrade on one of my system to LXD 2.9.1 LXD does not start anymore :
error: UNIQUE constraint failed: storage_pools.name
From the service status :
Static hostname: lxd-virt-01a Operating System: Ubuntu 16.04.2 LTS Kernel: Linux 4.4.0-62-generic Architecture: x86-64
Any ideas about this error just after the upgrade/reboot ?