canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 881 forks source link

Cloud-init not applying networking config #3341

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1817644

Launchpad details
affected_projects = ['maas']
assignee = None
assignee_name = None
date_closed = 2019-02-26T19:19:18.831542+00:00
date_created = 2019-02-26T00:50:07.805859+00:00
date_fix_committed = None
date_fix_released = None
id = 1817644
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1817644
milestone = None
owner = andreserl
owner_name = Andres Rodriguez
private = False
status = invalid
submitter = mgaribaldi
submitter_name = Michael Garibaldi
tags = []
duplicates = []

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T00:50:07.805859+00:00

Machines deployed via MaaS is not applying the network configuration as defined in the user_data cloud config file.

To deploy we are using the following (where $user_data is base64 encoded the cloud-config file) maas machine deploy user_data=$user_data

The #cloud-config file consists of the following: adding a user install packages allow password authentication configure eno1 which assigns a static IP and tags VLAN (version 2 specified)

The result is a successful deploy with all the above configurations applied except the network portion.

/var/lib/instances/user-data.txt obtains the expected configuration as it was provided in the mass deploy command, including the network config.

/etc/cloud/cloud.cfg.d/50-curtin-networking.cfg file obtains the configuration which MaaS is configured for during commissioning which is DHCP rather than the configuration for a static IP address assignment and VLAN tag.

In addition, while trying to debug cloud-init we are not able to successfully analyze /var/log/cloud-init.log and get the following error:

ubuntu@test:/var/log$ sudo cloud-init analyze show -i ./cloud-init.log Traceback (most recent call last): File "/usr/bin/cloud-init", line 11, in load_entry_point('cloud-init==18.4', 'console_scripts', 'cloud-init')() File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 904, in main get_uptime=True, func=functor, args=(name, args)) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2514, in log_time ret = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/analyze/main.py", line 104, in analyze_show args.print_format)): File "/usr/lib/python3/dist-packages/cloudinit/analyze/show.py", line 192, in show_events return generate_records(events, print_format=print_format) File "/usr/lib/python3/dist-packages/cloudinit/analyze/show.py", line 175, in generate_records prev_evt = unprocessed.pop() IndexError: pop from empty list ubuntu@test:/var/log$

Upon searching in this log file however, we can see the user accounts being created and packages installed but any reference to the network configuration which was passed via cloud-init doesn't seem to exist. It is confirmed that the user created exists and is functional as well as the packages defined are installed.

We are able to manually apply network configuration via cloud-init by performing the following steps:

rename current curtin networking config file

mv /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg.old

create new curtin networking config file with same permissions as original

touch /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg

copy network: stanza from cloud-config file to 50-curtin-networking.cfg

clean and re-init

cloud-init clean cloud-init init

test network config

netplan try

Result here is success where the static IP is accessible on the NIC.

For reference the switch-port config for eno1 for this node is as follows: default/native vlan XXX vlan tagged on that interface YYY **goal here is to PXE on the network VLAN XXX and then jump to live network during deloyments which then should put the server on VLAN YYY, our production network.

We cannot achieve this success with the MaaS deployments and would like some assistance to debug this issue.

Thank you all in advance.

mgaribaldi@maas-rack16:/var/log/maas$ dpkg -l 'maas'|cat Desired=Unknown/Install/Remove/Purge/Hold | Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend |/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad) ||/ Name Version Architecture Description +++-===============================-======================================-============-================================================= ii maas 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all "Metal as a Service" is a physical cloud and IPAM ii maas-cli 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS client and command-line interface un maas-cluster-controller (no description available) ii maas-common 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS server common files ii maas-dhcp 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS DHCP server un maas-dns (no description available) ii maas-proxy 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS Caching Proxy ii maas-rack-controller 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all Rack Controller for MAAS ii maas-region-api 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all Region controller API service for MAAS ii maas-region-controller 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all Region Controller for MAAS un maas-region-controller-min (no description available) un python-django-maas (no description available) un python-maas-client (no description available) un python-maas-provisioningserver (no description available) ii python3-django-maas 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS server Django web framework (Python 3) ii python3-maas-client 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS python API client (Python 3) ii python3-maas-provisioningserver 2.5.0-7442-gdf68e30a5-0ubuntu1~18.04.1 all MAAS server provisioning libraries (Python 3) mgaribaldi@maas-rack16:/var/log/maas$

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T00:50:07.805859+00:00

Launchpad attachments: maas_logs.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Mike Pontillo(mpontillo) wrote on 2019-02-26T01:18:20.701965+00:00

Can you provide an example of the cloud-init user-data you are specifying?

MAAS provides its own network configuration to cloud-init, based on what has been configured for the machine in MAAS. You should be able to set the static IP address and VLAN using the MAAS API or UI to get the behavior you want.

I don't know how cloud-init will resolve the conflict if the network configuration is supplied more than once - but from this bug, I'm assuming that the configuration is overridden by what MAAS supplies.

If you want MAAS to suppress supplying its own network configuration to cloud-init, MAAS does not support that at this time.

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T07:01:04.652603+00:00

Thank you for the quick response. Below is the cloud-init user-data file.

mgaribaldi@maas:~$ cat test-cloudinit.yml

cloud-config

ssh_pwauth: true package_upgrade: true packages:

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2019-02-26T08:07:03.535853+00:00

Hi Michael,

The question is, why are you sending cloud-init configuration with network config when MAAS already sends one?

Have you considered the possibility that since you are sending duplicate information, cloud-init may be failing altogether? That said, 50-curtin-networking.cfg is the configuration that is passed to curtin and curtin passes to cloud-init and it is not the configuration that you are sending via user-data. That configuration is part of the user-data, not part of files in /var/lib/cloud-init.

My guess here is that cloud-init is either ignoring this or failing to run.

Could you please attach the /var/log/cloud-*.log from the system being deployed?

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2019-02-26T08:28:50.667820+00:00

To clarify:

  1. curtin writes /etc/cloud/cloud.cfg.d/50-curtin-networking.cfg based on what MAAS sends to curtin, and curtin writes it for cloud-init.
  2. What you pass via user-data, is computed by cloud-init when it accesses the metadata server, and that should live in /var/lib/cloud/
ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2019-02-26T12:40:54.168664+00:00

Michael, instead of just the /var/log/cloud-* files Andres asked for, could you instead run cloud-init collect-logs on the system and attach the produced tarball here? (Once done, please move the cloud-init task back to New.)

Thanks!

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T18:52:18.468393+00:00

The idea is that we are trying to set up MaaS in such a way: 1) With a single NIC configuration (not including BMC NIC) we would like to PXE boot while on a "provisioning" network where MaaS provides DHCP. 2) After deployments, we want to move our server to a live network by then passing the second network config.

I don't believe cloud-init is failing all together because all other actions specified in my cloud-config file are successful (create user, install packages, etc) which suggests that cloud-init completed but just the network actions either failed or omitted which is interesting.

Launchpad attachments: cloud-init.tar.gz

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2019-02-26T19:19:13.515357+00:00

This doesn't sound like a cloud-init bug to me. If it's provided with two sets of network configuration (in this case, on disk via curtin from MAAS and from you via user-data), it's going to have to choose one. If it had chosen the user-data one here, then it wouldn't have been able to provision at all, so I think it's also making the right choice about which config to use.

I don't have a full mental model of MAAS, but I think it expects to own the networking of the instances it manages. (This is, perhaps, a MAAS feature request, to support this workflow?)

As things stand (unless there is something MAAS can do to already support this), I think you'll need to handle this "deprovisioning" step yourself in the way you described in the bug report.

(I'm going to mark this as Invalid for cloud-init, but please do move it back to New if you think that's incorrect!)

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T20:55:20.911108+00:00

Is there anything wrong with my syntax upon trying to analyze the cloud-init log? I am receiving the errors below.

ubuntu@test:/var/log$ sudo cloud-init analyze show -i ./cloud-init.log Traceback (most recent call last): File "/usr/bin/cloud-init", line 11, in load_entry_point('cloud-init==18.4', 'console_scripts', 'cloud-init')() File "/usr/lib/python3/dist-packages/cloudinit/cmd/main.py", line 904, in main get_uptime=True, func=functor, args=(name, args)) File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2514, in log_time ret = func(*args, **kwargs) File "/usr/lib/python3/dist-packages/cloudinit/analyze/main.py", line 104, in analyze_show args.print_format)): File "/usr/lib/python3/dist-packages/cloudinit/analyze/show.py", line 192, in show_events return generate_records(events, print_format=print_format) File "/usr/lib/python3/dist-packages/cloudinit/analyze/show.py", line 175, in generate_records prev_evt = unprocessed.pop() IndexError: pop from empty list ubuntu@test:/var/log$

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2019-02-26T21:16:12.563141+00:00

Nope, that looks like a bug; could you file a new one, please, with cloud-init collect-logs attached?

(FWIW, if you're just trying to analyze the regular log, you shouldn't need to specify -i as it defaults to /var/log/cloud-init.log anyway; I don't think that's the source of the problem.)

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T22:25:08.955852+00:00

Is it possible to have eno1 on a commissioned node which is plugged in on the same network that MaaS provisions nodes and have MaaS configure eno2 to be configured for another network?

MaaS in this case is physically connected to both networks.

It appears as though the network configuration isn't allowed. The NIC config reverts to the PXE subnet prior to deploying upon reviewing the interface configs.

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-26T22:38:30.716855+00:00

Correction, the NIC config reverts the static config to "unconfigured" but for the correct network.

ubuntu-server-builder commented 1 year ago

Launchpad user Andres Rodriguez(andreserl) wrote on 2019-02-27T21:21:34.201641+00:00

@Michael,

Can you provide an step by step of how you are configuring interfaces? Also can you provide some logs? If you are doing this via the UI, have you seen: https://bugs.launchpad.net/maas/+bug/1659151

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-28T09:39:09.749459+00:00

I believe this issue is out of scope of this thread. Perhaps I should create another one?

The issue I was seeing consists of the same issues mentioned in the pre-existing bug report mentioned and also provided below again, but after applying the work around I have noticed one extra issue when trying to assign the network configuration for a node in the GUI that requires VLAN tagging. https://bugs.launchpad.net/maas/+bug/1659151

Here are the steps taken: 1) Commission node - eno1 configured "auto assigned" on the default VLAN 2) After commissioning I update the eno2 interface to be configured on a different fabric, on VLAN XXX and with a static IP 3) Node is then deployed 4) After the completion of the deployment process I can SSH to the box on the default network and notice the netplan config has the: IP is set correctly VLAN for eno2 not configured for any VLAN; thus, the interface isn't reachable on that subnet/fabric/VLAN.

In summary, upon defining an interface in the MaaS GUI, the VLAN config isn't being applied during deployment.

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Garibaldi(mgaribaldi) wrote on 2019-02-28T09:40:04.536525+00:00

Please let me know which logs you would like to see and I can provide. Thank you for your assistance.

ubuntu-server-builder commented 1 year ago

Launchpad user Launchpad Janitor(janitor) wrote on 2019-04-30T04:17:23.451802+00:00

[Expired for MAAS because there has been no activity for 60 days.]