andrewdmcleod / magpie-layer

testing
Apache License 2.0
0 stars 12 forks source link

Wrong Reference Interface Used When Validating MTU #17

Closed relaxdiego closed 3 years ago

relaxdiego commented 3 years ago

Initial Observations

When deploying multiple Magpie units on the same baremetal machine, each communicating on different interfaces with different MTUs, Magpie appears to use the wrong reference interface to determine what MTU to expect. For instance, given machines having three interfaces with their respective configured MTUs as follows:

Magpie reports the following:

Steps to Reproduce

Given:

  1. A MAAS environment with machines configured to communicate over 2 or more networks with different MTUs
  2. A Juju controller using the MAAS provider

Save the following bundle as ~/magpie-bundle.yaml:

series: "bionic"
machines:
  0:
    constraints: tags=magpie-batch-2
  1:
    constraints: tags=magpie-batch-2

applications:
  magpie-oam: &magpie-common
    charm: "cs:~openstack-charmers/magpie"
    series: "bionic"
    bindings:
      magpie: oam-space  # interface: bondm
    to:
    - 0
    - 1
    num_units: 2

  magpie-external:
    <<: *magpie-common
    options:
      check_bonds: bond0
    bindings:
      "": oam-space
      magpie: external-space  # interface: bond0.10

  magpie-ceph-access:
    <<: *magpie-common
    options:
      check_bonds: bond1
    bindings:
      "": oam-space
      magpie: ceph-access-space  # interface: bond1.7

Then deploy as follows:

$ juju add-model batch-2
$ juju deploy ~/magpie-bundle.yaml

Code Analysis

Tracing the code that attempts to get the reference interface, we come across this snippet:

https://github.com/andrewdmcleod/magpie-layer/blob/4f6da1a197df866176fece7649aa6aa6a4ce4919/lib/charms/layer/magpie_tools.py#L107-L109

Specifically, local_ip is obtained by calling the underlying hook tool unit-get private-address. However, this hook tool is problematic and the Juju team recommends using network-get instead (Reference bug). Trying out unit-get private-address manually, yields:

$ juju run --unit magpie-ceph-access/1 unit-get private-address
10.3.126.93

...which is the address of the bondm interface. Trying out network-get manually yields:

$ juju run --unit magpie-ceph-access/1 network-get magpie
bind-addresses:
- macaddress: 3c:fd:fe:ec:ec:c1
  interfacename: bond1.7
  addresses:
  - hostname: ""
    address: 192.168.19.64
    cidr: 192.168.19.0/24
egress-subnets:
- 192.168.19.64/32
ingress-addresses:
- 192.168.19.64

...which is the information for the correct interface. Thus, line 109 in the above snippet could be updated to:

local_ip = hookenv.network_get("magpie")['ingress-addresses'][0]

Workaround

One workaround is to set the default binding to the one that magpie is intended to test. For example:

  magpie-ceph-access:
    <<: *magpie-common
    options:
      check_bonds: bond1
    bindings:
      "": ceph-access-space

This will cause Magpie to set the "magpie" binding to "ceph-access-space" as well so explicitly setting "magpie" is not necessary. Using the above workaround will result in unit_private_ip() retrieving the correct IP albeit by chance.

relaxdiego commented 3 years ago

Note that the above workaround will only work if the Juju controller is also accessible via ceph-access-space.

relaxdiego commented 3 years ago

Moved to https://github.com/openstack-charmers/magpie-layer/issues/17