canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 926 forks source link

Container creation with MAAS #6014

Closed laralar closed 5 years ago

laralar commented 5 years ago

Required information

Issue description

If the following line is added in the default profile, the containers can't be updated in MAAS devices: eth0: maas.subnet.ipv4: AIBLSRV name: eth0 nictype: bridged parent: br1 type: nic

On the same. After a successful lxd init and using a local bridge with the respective MAAS link to the the network, an error is received after creating a container:

root@node13:~# lxc profile edit default
Config parsing error: The following containers failed to update (profile change still saved):
 - node13test3: device 0: device 2.0 schema check failed: pool: expected map, got nothing

Steps to reproduce

Previously, I used to add/delete that line in the profile to attach/detach all containers from MAAS.

root@node13:~# lxc launch local:b node13test4
Creating node13test4
Error: success
root@node13:~#
t=2019-07-26T11:40:56+0530 lvl=info msg="Creating container" ephemeral=false name=node13test4 project=default
t=2019-07-26T11:40:58+0530 lvl=info msg="Deleting container" created=2019-07-26T11:40:56+0530 ephemeral=false name=node13test4 project=default used=1970-01-01T05:30:00+0530
t=2019-07-26T11:40:58+0530 lvl=eror msg="Failed deleting container MAAS record" err="device 0: device 2.0 schema check failed: pool: expected map, got nothing" name=node13test4
t=2019-07-26T11:40:58+0530 lvl=eror msg="Failed creating container" ephemeral=false name=node13test4 project=default

the container is added to MAAS, in the Containers tab, but it is not show in the lxc list of created containers

Information to attach

laralar commented 5 years ago

PS: If I delete the line maas.subnet.ipv4: AIBLSRV from the default profile.. there are no issues,, the container is created without any problem, althoutgh it is not registered in MAAS. If later on I try to add the line to the default profile it fails

More, if I override the device in the container itselt. it is not understanding the maas.subnet.ipv4 entry

devices: eth0: name: eth0 maas.subnet.ipv4: AIBLSRV nictype: bridged parent: br1 type: nic

It seems that the entry was removed from the schema?

stgraber commented 5 years ago

Ok, sounds like those are MAAS errors rather than LXD's, so I wonder what may have changed on that side.

What MAAS version are you running?

laralar commented 5 years ago

Nothing has changed in the MAAS configuration. Originally I had 2.4.2 MAAS, but upgraded to the latest 2.6.0 and still I have the same issue.

The only thing that has changed is the snap LXD version

Is maas.subnet.ipv4: AIBLSRV still a valid argument in the device configuration?

  1. If I add that line in the profile and create a container, I get an Error: success,
  2. the register is added in MAAS though and shows up in the MAAS Containers tab with the respective MAC address.
  3. but the container is not created in LXD

Somehow MAAS is returning "success" and could it be that LXD is interpreting that as an error?

Is there a way to install a previous version of snap LXD to see if there is a regression?

Thanks Luis

PD: I have a newly installed host with LXD where this error is seen. I used to have a lxd-maas.yaml file for the lxd init command, but now I cant use it, since it gives me the respective error.

stgraber commented 5 years ago

Yes, that config key is correct, all the errors you've seen so far were directly reported by MAAS when LXD fed it the configuration. The LXD side of this logic hasn't changed since we first introduced it.

Failure to configure MAAS would cause the container creation to fail on LXD's side, so that explains the missing container. On container creation failure, LXD will attempt to delete the MAAS allocation but apparently that must have failed too.

stgraber commented 5 years ago

Not running into this issue with LXD 3.15 and MAAS 2.6.0 here:

root@edfu:~# lxc config set maas.api.url http://maas01.maas.mtl.stgraber.net:5248/MAAS
root@edfu:~# lxc config set maas.api.key "abc:def"
root@edfu:~# lxc profile create maas
root@edfu:~# lxc profile device add maas eth0 nic nictype=bridged parent=br0 name=eth0 maas.subnet.ipv4=MAAS-IPv4
Device eth0 added to maas
root@edfu:~# lxc launch ubuntu:18.04 c1 -p default -p maas
Creating c1
Starting c1
root@edfu:~# host c1.maas.mtl.stgraber.net
c1.maas.mtl.stgraber.net has address 172.17.16.144
root@edfu:~# lxc delete -f c1
root@edfu:~# host c1.maas.mtl.stgraber.net
Host c1.maas.mtl.stgraber.net not found: 3(NXDOMAIN)
stgraber commented 5 years ago

The error sounds like MAAS resource pools may be at play. Are you using those?

stgraber commented 5 years ago
root@edfu:~# lxc launch ubuntu:18.04 c1 -p default -p maas
Creating c1
Error: Failed container creation: success     

So yeah, seems like a resource pool problem then.

stgraber commented 5 years ago

Hmm, no, looks like that error came due to LXD not having connected to MAAS yet, lxd.log is quite a bit more specific about this, not sure why the returned error was wrong though.

laralar commented 5 years ago

Yes.. I am using resource pools.. I'll try disableing them and update

On Sat, Jul 27, 2019 at 9:32 PM Stéphane Graber notifications@github.com wrote:

root@edfu:~# lxc launch ubuntu:18.04 c1 -p default -p maas Creating c1 Error: Failed container creation: success

So yeah, seems like a resource pool problem then.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/lxc/lxd/issues/6014?email_source=notifications&email_token=AAW57U3KSCCJPZQJTTUAHXTQBRWPZA5CNFSM4IHAZGP2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD26OBQA#issuecomment-515694784, or mute the thread https://github.com/notifications/unsubscribe-auth/AAW57UYYPEASICVSYUHMYN3QBRWPZANCNFSM4IHAZGPQ .

laralar commented 5 years ago

OK.. I deleted the resource pools. and I am in the same end as you now.

root@node13:~# lxc launch ubuntu:18.04 c1 -p default -p maas
Creating c1
Error: Failed container creation: success
root@node13:~#
root@node13:~# lxc profile show maas
config: {}
description: ""
devices:
  eth0:
    maas.subnet.ipv4: AIBLSRV
    name: eth0
    nictype: bridged
    parent: br1286
    type: nic
name: maas
used_by: []

The container entry is created in the MAAS machine, but now the container is not created.

But now the message is different. Previously I only had

Error: success
stgraber commented 5 years ago

Marking as incomplete. All I managed to get here are some error handling issues which I'll send a branch for, but no actual issue on the LXD side that I could find.

We need your full /var/snap/lxd/common/lxd/logs/lxd.log to maybe make more sense of this, though it looks like it's MAAS that's returning a weird error in your case, so looking at the MAAS log may help too.

laralar commented 5 years ago

Did you manage to be able to create a container again?

I mean,, what did you do in MAAS to get the error, createad a resource pool and added the machine in the resource pool?

Because Now.. I don't have any resource pools and I am getting the error that you reported "seems like a resource pool issue"

I'll try to look into MAAS logs, As I remember I already did and there was no specific information there,, but I'll try to have a look again. Thanks

stgraber commented 5 years ago

Ah, my version of the go maasapi package from the juju folks was a bit old locally, they recently added some support for pools, I wonder if that's what's causing the issue.

stgraber commented 5 years ago

yep, that looks like the culprit...

stgraber commented 5 years ago

Sent a pull request to the Juju team, hopefully they'll merge it and then the next snap refresh will correct this.

I tried a few workarounds on our side, but there's nothing I can do to bypass that validation.

https://github.com/juju/gomaasapi/pull/80

laralar commented 5 years ago

Hi.. I saw that this change was merged into juju API 8 days ago,, will it come in the next LXD/LXC snap ?

thanks

stgraber commented 5 years ago

It should be in the current snap already, I made sure to rebuild the LXD 3.16 snap after the fix got merged.