canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

error: UNIQUE constraint failed: storage_pools.name #2906

Closed bengivre closed 7 years ago

bengivre commented 7 years ago

Hi,

I just did the upgrade on one of my system to LXD 2.9.1 LXD does not start anymore :

error: UNIQUE constraint failed: storage_pools.name

From the service status :

lxd.service - LXD - main daemon
   Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled)
   Active: activating (start-post) (Result: exit-code) since dim. 2017-02-19 01:50:50 CET; 23s ago
     Docs: man:lxd(1)
  Process: 3196 ExecStart=/usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log (code=exited, status=1/FAILURE)
  Process: 3185 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS)
 Main PID: 3196 (code=exited, status=1/FAILURE);         : 3197 (lxd)
    Tasks: 7
   Memory: 8.0M
      CPU: 44ms
   CGroup: /system.slice/lxd.service
           └─control
             └─3197 /usr/bin/lxd waitready --timeout=600

févr. 19 01:50:50 lxd-virt-01a systemd[1]: Starting LXD - main daemon...
févr. 19 01:50:50 lxd-virt-01a lxd[3196]: error: UNIQUE constraint failed: storage_pools.name
févr. 19 01:50:50 lxd-virt-01a systemd[1]: lxd.service: Main process exited, code=exited, status=1/FAILURE

Any ideas about this error just after the upgrade/reboot ?

brauner commented 7 years ago

Hi, can you please show the output of:

Did you use a custom storage pool or did you use /var/lib/lxd/zfs.img?

bvassie commented 7 years ago

I am having the exact same issue

sqlite3 /var/lib/lxd/lxd.db "SELECT name FROM storage_pools"
zpool

I am using a custom pool

bengivre commented 7 years ago

I did check the DB it was looking ok, here is the result :

sqlite3 /var/lib/lxd/lxd.db "SELECT name FROM storage_pools" lxd-zpool

Using a custom zfs zpool , not an img

brauner commented 7 years ago

Hm, I performed an upgrade from a custom storage pool right now without problems. Can you please show me the contents of your zpools?:

 zfs list -r <pool_name> -t all

and the layout of /var/lib/lxd, specifically:

I want to see whether the upgrade actually was performed or whether it already stopped at the database entry creation stage.

bengivre commented 7 years ago

here is the details :

For a better indentation I putted here : http://pastebin.com/3TFTPagE

bengivre commented 7 years ago

Something that might be useful to say, it was a server originally in pure LXC , converted to an LXD server a long time ago and LXD was working well. Thanks for your help

brauner commented 7 years ago

So what do you see when you start the daemon in the foreground:

/usr/bin/lxd --debug --group lxd

This looks like the upgrade worked but the api patch is not marked as applied.

stgraber commented 7 years ago

@brauner /var/lib/lxd/images doesn't look fully migrated to me. There's a leftover .zfs in there.

@givre I don't think it's really an issue, but you can get rid of a couple of remnants from a failed image import with: "rm -Rf /var/lib/lxd/images/lxdbuild*"

bengivre commented 7 years ago

`/usr/bin/lxd --debug --group lxd

`

I did "rm -Rf /var/lib/lxd/images/lxdbuild*" but LXD still does not start

bengivre commented 7 years ago

Maybe I can backup lxd.db , remove lxd/lxc and install it again ? Or you wanna try to find what happened here?

stgraber commented 7 years ago

No, we should be able to fix this for you. I think it matches what I ran into here last night.

stgraber commented 7 years ago

https://github.com/lxc/lxd/issues/2907 is the report I filed for what happened to one of my machines.

In your case, it looks like:

lxd-zpool/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb            576M   274G   576M  /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs

Is the problem.

To fix things for you, I think you should run:

systemctl stop lxd
umount -l /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
touch /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d.zfs
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
touch /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606.zfs
touch /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681.zfs
touch /var/lib/lxd/images/ca9a673ed503cff575af61e0f1f8c0ffa127539c01e454b01ac64ef1e8a6ed7f.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db
lxd --debug --group lxd

The goal of those commands is to re-create all the files that the migration tool expects. That's two files for each container under /var/lib/lxd/containers and up to three files per image in /var/lib/lxd/images/. We then restore the old database and re-run LXD this time with debug logging which should run the migration again and hopefully not fail this time.

bengivre commented 7 years ago

hi @stgraber
Thanks for the commands matching my system. Here is the result I got from the last command :

We already see that error:
DBUG[02-20|02:34:48] No existing storage pools detected.

Here is what i have is try again to start it :

stgraber commented 7 years ago

Gah, crap, that image isn't referenced in LXD and should have been removed from disk instead...

Ok, lets try this again:

systemctl stop lxd
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
rm -f /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d*
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
rm -f /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606*
rm -f /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681*
touch /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db
lxd --debug --group lxd
bengivre commented 7 years ago

still the same error :/

error: Failed to set new ZFS mountpoint: cannot open 'lxd-zpool/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681': dataset does not exist

bengivre commented 7 years ago

I can see it in the DB, maybe I can just remove it ?

bengivre commented 7 years ago

Ok, so , I tried to remove the references in the DB of that image in tables:

It started . Maybe you wanna try something else ?

stgraber commented 7 years ago

Ok.

What do you mean with "It started"? LXD started properly this time? If so, then just running "systemctl start lxd" should get the daemon back online and working.

bengivre commented 7 years ago

Oups, it started correctly , but that was not a good idea , because the containers linked to that image can't start .

`service lxd status ● lxd.service - LXD - main daemon Loaded: loaded (/lib/systemd/system/lxd.service; indirect; vendor preset: enabled) Active: active (running) since lun. 2017-02-20 03:06:02 CET; 49s ago Docs: man:lxd(1) Process: 5780 ExecStartPost=/usr/bin/lxd waitready --timeout=600 (code=exited, status=0/SUCCESS) Process: 5767 ExecStartPre=/usr/lib/x86_64-linux-gnu/lxc/lxc-apparmor-load (code=exited, status=0/SUCCESS) Main PID: 5779 (lxd) Tasks: 22 Memory: 17.1M CPU: 498ms CGroup: /system.slice/lxd.service ├─5109 /usr/lib/x86_64-linux-gnu/lxc/lxc-monitord /var/lib/lxd/containers 6 ├─5111 [lxc monitor] /var/lib/lxd/containers juju-16cbed-0 ├─5211 [lxc monitor] /var/lib/lxd/containers juju-16cbed-0 ├─5779 /usr/bin/lxd --group lxd --logfile=/var/log/lxd/lxd.log └─5865 dnsmasq --strict-order --bind-interfaces --pid-file=/var/lib/lxd/networks/lxdbr0/dnsmasq.pid --except-interface=lo --interface=lxdbr0 --listen-address=10.250.179.1 --dhcp-no-override --dhcp-authoritative --dh

févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: Lecture de /etc/resolv.conf févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: utilise les adresses locales seulement pour domaine lxd févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: utilise le serveur de nom 10.101.90.82#53 févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: lecture /etc/hosts - 5 adresses févr. 20 03:06:02 lxd-virt-01a dnsmasq-dhcp[5865]: Lecture de /var/lib/lxd/networks/lxdbr0/dnsmasq.hosts févr. 20 03:06:02 lxd-virt-01a dnsmasq[5865]: lecture /etc/hosts - 5 adresses févr. 20 03:06:02 lxd-virt-01a dnsmasq-dhcp[5865]: Lecture de /var/lib/lxd/networks/lxdbr0/dnsmasq.hosts févr. 20 03:06:02 lxd-virt-01a systemd[1]: Started LXD - main daemon. févr. 20 03:06:22 lxd-virt-01a lxd[5779]: lvl=warn msg="Unable to refresh cache, using stale entry" server=https://cloud-images.ubuntu.com/releases/ t=2017-02-20T03:06:22+0100 févr. 20 03:06:42 lxd-virt-01a lxd[5779]: lvl=warn msg="Unable to refresh cache, using stale entry" server=https://cloud-images.ubuntu.com/releases t=2017-02-20T03:06:42+0100 `

But

lxc start dhcp-lan error: No storage pool specified.

What surprised me is I never deleted any images from the file system itself

bengivre commented 7 years ago

Or , I understand why. I migrated /var/lib/lxd a while ago , from BTRFS to ZFS . The missing image is here :

stgraber commented 7 years ago

Ok, so LXD appears to be running and just be a bit confused because of missing disk entries in the profiles.

Can you post the output of "lxc list"?

bengivre commented 7 years ago

lxc list +----------------+---------+------+------+------------+-----------+ | NAME | STATE | IPV4 | IPV6 | TYPE | SNAPSHOTS | +----------------+---------+------+------+------------+-----------+ | blockcdn-01 | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | consul-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | dhcp-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | haproxy01-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+ | juju-16cbed-0 | RUNNING | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-17 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-18 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-19 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | juju-c0b770-20 | STOPPED | | | PERSISTENT | 0 | +----------------+---------+------+------+------------+-----------+ | odl-lan | ERROR | | | PERSISTENT | | +----------------+---------+------+------+------------+-----------+

stgraber commented 7 years ago

Can you post "lxc profile list"?

bengivre commented 7 years ago
lxc profile list
+-------------------+---------+
|       NAME        | USED BY |
+-------------------+---------+
| default           | 5       |
+-------------------+---------+
| docker            | 0       |
+-------------------+---------+
| juju-controller   | 1       |
+-------------------+---------+
| juju-default      | 0       |
+-------------------+---------+
| juju-web4allcloud | 4       |
+-------------------+---------+
| lan-only          | 3       |
+-------------------+---------+
| lan-wan           | 2       |
+-------------------+---------+
| migratable        | 0       |
+-------------------+---------+
stgraber commented 7 years ago

Ok, so I take it the broken containers are using lan-only or lan-wan but not the default profile.

This may fix things:

lxc profile device add lan-only root disk path=/ pool=default
lxc profile device add lan-wan root disk path=/ pool=default
bengivre commented 7 years ago
lxc list
+----------------+---------+------+------+------------+-----------+
|      NAME      |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+----------------+---------+------+------+------------+-----------+
| blockcdn-01    | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| consul-lan     | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| dhcp-lan       | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| haproxy01-lan  | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-16cbed-0  | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-17 | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-18 | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-19 | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-20 | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| odl-lan        | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+

Looks much better :) So what I did :

systemctl stop lxd
touch /var/lib/lxd/containers/blockcdn-01.zfs
touch /var/lib/lxd/containers/consul-lan.zfs
touch /var/lib/lxd/containers/dhcp-lan.zfs
touch /var/lib/lxd/containers/haproxy01-lan.zfs
touch /var/lib/lxd/containers/juju-16cbed-0.zfs
touch /var/lib/lxd/containers/juju-c0b770-17.zfs
touch /var/lib/lxd/containers/juju-c0b770-18.zfs
touch /var/lib/lxd/containers/juju-c0b770-19.zfs
touch /var/lib/lxd/containers/juju-c0b770-20.zfs
touch /var/lib/lxd/containers/odl-lan.zfs
touch /var/lib/lxd/images/0d0524134a82d26f0586f8e36b7538a302f928543cebee8982533ee4f79e304b.zfs
rm -f /var/lib/lxd/images/13124999f6342f47c157a0c4cc0d961ed63a7be56b270edbb5082f305069992d*
touch /var/lib/lxd/images/1e66f9fa622c0cf03ac395d6fddddcdb0a1ea08bd8753900b0f2a18acf42f2b3.zfs
touch /var/lib/lxd/images/315bedd32580c3fb79fd2003746245b9fe6a8863fc9dd990c3a2dc90f4930039.zfs
rm -f /var/lib/lxd/images/534391c2476a8b610b252592d1f975495fdb83ef24f842b633c3df88f6716606*
rm -f /var/lib/lxd/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681*
touch /var/lib/lxd/images/b5b03165de7c450f5f9793c8b2eb4a364fbd81124a01511f854dd379eef52abb.zfs
cp /var/lib/lxd/lxd.db.bak /var/lib/lxd/lxd.db

Then, I did create the missing dataset in ZFS manually

zfs create lxd-zpool/images/9f5483d8c23933d0f731060b2f163586f053babfcbbdef3737157af4a6dc0681

Then , started lxd in debug mode lxd --debug --group lxd

Like that I did not touch the DB

And finally , had to change defaut by the name of my zpool

lxc start dhcp-lan
error: Container is supposed to exist on storage pool "default", but it actually exists on "lxd-zpool".
Try `lxc info --show-log dhcp-lan` for more info

So I did

lxc profile device add lan-only root disk path=/ pool=lxd-zpool
lxc profile device add lan-wan root disk path=/ pool=lxd-zpool

And now, everything just works fine !

lxc list
+----------------+---------+------+------+------------+-----------+
|      NAME      |  STATE  | IPV4 | IPV6 |    TYPE    | SNAPSHOTS |
+----------------+---------+------+------+------------+-----------+
| blockcdn-01    | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| consul-lan     | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| dhcp-lan       | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| haproxy01-lan  | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-16cbed-0  | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-17 | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-18 | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-19 | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| juju-c0b770-20 | RUNNING |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+
| odl-lan        | STOPPED |      |      | PERSISTENT | 0         |
+----------------+---------+------+------+------------+-----------+

Thanks a lot for your support !

stgraber commented 7 years ago

Cool, glad to hear you're back online.

stgraber commented 7 years ago

I'm going to close this issue as we have duplicates of it for the problems reported here.