canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.38k stars 931 forks source link

MAAS unable to refresh KVM host with LXD 5.5 #10868

Closed jsimpso closed 2 years ago

jsimpso commented 2 years ago

I'm seeing an issue with MAAS-deployed KVM hosts using LXD 5.5.

I have two LXD severs presenting the same issues at the moment, both were originally deployed with LXD 5.4 and were working okay, but are now on LXD 5.5 and seeing issues. That's not to assert that LXD 5.5 is the cause of these issues, but I don't believe anything else has changed in the mean time.

When I run the "Refresh KVM Host" action in MAAS, eventually I see an error raised: Failed talking to pod: 'NoneType' object has no attribute 'get'.

Running lxc monitor on the server while MAAS tries a refresh shows several Network not found database errors: https://pastebin.ubuntu.com/p/Vv8RMYm3Z8/

However, those errors don't appear when curling one of those endpoints via the unix socket:

jsimpso@bridgman:~$ sudo curl --unix-socket /var/snap/lxd/common/lxd/unix.socket lxd/1.0/networks/br-dmz?project=maas
{"type":"sync","status":"Success","status_code":200,"operation":"","error_code":0,"error":"","metadata":{"config":{},"description":"","name":"br-dmz","type":"bridge","used_by":["/1.0/profiles/default?project=maas","/1.0/instances/juju-controller-3fp-maas?project=maas","/1.0/instances/prometheus-alertmanager-1?project=maas"],"managed":false,"status":"","locations":null}}

Any idea what's going wrong here? Please let me know if there's any further info I can provide!

tomponline commented 2 years ago

The Network not found messages appear to be happening due to requests for physical network interfaces, e.g.

location: none
metadata:
  context:
    ip: <IPv4>:54780
    method: GET
    protocol: tls
    url: /1.0/networks/eno1?project=maas
    username: 244046dfb76bc19705207786af92e947ad5b21f0da3d1b42b26c14c24b71fae7
  level: debug
  message: Handling API request
timestamp: "2022-08-31T06:57:03.980762379Z"
type: logging

location: none
metadata:
  context:
    err: Network not found
  level: debug
  message: Database error
timestamp: "2022-08-31T06:57:03.982069979Z"
type: logging

Which makes sense because LXD won't have network DB records for those interfaces. But the message is just a debug level log and is not an actual warning or error.

tomponline commented 2 years ago

@jsimpso I am not familiar with how MAAS works so it would be useful to get some input from someone familiar with MAAS as to what would trigger a Failed talking to pod: 'NoneType' object has no attribute 'get' error and how that relates to API calls to LXD.

tomponline commented 2 years ago

Hi @sparkiegeek, do you have have any idea as to what could cause MAAS to generate this Failed talking to pod: 'NoneType' object has no attribute 'get' error?

stgraber commented 2 years ago

@albertodonato @bjornt

sparkiegeek commented 2 years ago

File a bug in MAAS and add logs - we shouldn't try and triage a MAAS bug on the LXD project