canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.35k stars 931 forks source link

modify storage pools, one missing but empty, cant start lxd daemon due to empt pool gone but desired by config #6009

Closed jwuethrich closed 5 years ago

jwuethrich commented 5 years ago

im on cent7 with snap and lxd through snap. which is part why i dont know my head from my tail here.

looking for what i would hope is a config file i can delete a line in. snap auto refreshed all the sudden im looking at my container host not the console i was in.

lxd wont start because a pool is missing. this instance of lxd hosts 3 machines and not on that pool. pool in question was empty. i created it almost a year ago when full lxd/lxc and container noob. was also dealing with fact that my parents let their house get so moldy it took out my apt when i visited to say good bye to my feline friend for the last time. im now typing this from a different laptop in an apt with 3 folding tables and a desk chair. i digress

t=2019-07-24T11:08:57-0700 lvl=info msg="LXD 3.15 is starting in normal mode" path=/var/snap/lxd/common/lxd

t=2019-07-24T11:08:57-0700 lvl=eror msg="Error initializing storage pool \"gripPuddle\": the requested volume group \"gripPuddle\" does not exist, correct functionality of the storage pool cannot be guaranteed" t=2019-07-24T11:08:57-0700 lvl=info msg="Applying patch: storage_api_rename_container_snapshots_dir_again" t=2019-07-24T11:08:57-0700 lvl=eror msg="Failed to start the daemon: the requested volume group \"gripPuddle\" does not exist"

which still shows in /dev/disks/by-label

3 containers is all this hosts and 2 of 3 are running . with a little research before yelling help... i can tell you /var/snap/lxd/common/lxd/containers has them listed

so this should be as simple as either the pool got swapped and needs to be changed or just the line seeking the additional pool removed from a yaml files somewhere no?

whats really annoying here is if lxd would start id hit lxc config..... or lxc storage but Error: Get http://unix.socket/1.0: dial unix /var/snap/lxd/common/lxd/unix.socket: connect: no such file or directory

I like the product and have seen enough responses by Stéphane to other peoples issues I know you (and or team) are passionate about helping people and your product.

that said (and forgive me if I am misreading this situation)

this seems like a bit of a architecture/ideological flaw to have an error make lxc [anything cfg related] fail.

lxd = an interface to manage and extend the functionality of lxc

but if that manager bails/exits/wont start on (in this case totally trivial error) by an auto update from snap....and user is left with system that obscures where the file with the error is... see where im going here?

I get there are reasons(integrity/corruption concurrency, known/defined states etc) but have you considered launching but limiting what a user can do until error is both SHOUTED at them and resolved? presently: to adapt the give or teach a man to fix metaphor. im hungry, thought i knew how to cast a line but the lake just disappeared before my eyes and the pole is claiming unix socket issue :D

thanks for any and all help!

stgraber commented 5 years ago

Normally, we would start with degraded features and that may in fact be why you never saw this issue before. The problem here is that LXD 3.15 as part of its upgrade process requires access to ALL storage pools so that some data can be shuffled around.

It's not something we can put off as the code in LXD 3.15 expects that data to have been moved.

The error you're getting is because of a LVM based pool called gripPuddle which apparently cannot be found.

I'd recommend you do:

If you see it listed there, then try restarting LXD with systemctl reload snap.lxd.daemon.

If the pool can't be found and isn't actually used for your containers (lvs may help to check that), then you have two options to get rid of it:

1) Write a DB patch at /var/snap/lxd/common/lxd/database/patch.global.sql containing something like:

DELETE FROM storage_pools WHERE name='gripPuddle';

2) Temporarily create an empty LVM VG called gripPuddle. This should allow LXD to start and complete its data migration correctly, at which point you can use the normal lxc storage list and lxc storage delete tools to get rid of it.

jwuethrich commented 5 years ago

Awesome, no go on pvscan/1st part.

ls -la on containers dir crossed fingers -> echoed db patch to dir u provided -> snap restart lxd

looks like im good! many thanks!

off topic and i can dig on my own more or start new thread(if desired)

but when i started this (at least a year ago) i desired 3 containers with total 3 external ips, -word depress (clients current and very non compliant email order form) , -cart container that may take over everything wp does later plus talk to world pay( and introduce owner to concept of tokens lol) -vpn /nat for all update/non handle dishing page to requester traffic underlying vps has no external ip. all unprivileged vps host is one mac limit

---I ended up with

-macvlan passthru to vpn container. -bash script on host loops and sets up ip route 2 access to the namespaces -still on host "ip netns vpncon exec "and create ipvlan l2 devices then move them two the respective web servers.

im not sure if i over complicated this from the get go. I remember having a hell of a time getting the vps to give up its ip (and have the show go on) without this setup.

I am by no means sure of what im talking about (ran citrix xen personally in 08, university lab(teaching environment) of desktops acting as servers ( for ms dynamics gp) to hyper-v and 3 donated servers in 09 then out of virtual and now container loop for a while.

that said i also came to the conclusion this was a good set up for keeping netfilter with respective containers/keeping the underlying host out of the loop unless you enter through provided vpn. (just about to implement that with wireguard interface passed to that container)

i read as of 3.13(i think) ipvlan is now native

left wondering besides simplicity booting do I gain anything? do I lose anything isolation wise? is this current setup nuts (security wise)/pointless(as of that version anyway) in some way i overlooked. I realize this would not scale well but this site wont have that issue.

I also read your profile, hoping this might actually be a fun one for you :D

stgraber commented 5 years ago

So if I understand right and the VPS provider does provide 3 public IPs going to one MAC, then yeah, you could probably use the IPVLAN support directly and avoid macvlan.

For those kind of architectural questions, especially those involving networking, https://discuss.linuxcontainers.org tends to be a better place as there are many people who've run into a lot of those setups over there and you may find some of the common tricks for some common hosting providers too.

Closing as the issue appears to have been resolved with the DB query.