canonical / cloud-init

Official upstream for the cloud-init: cloud instance initialization
https://cloud-init.io/
Other
2.99k stars 883 forks source link

No network after subiquity LPAR installation on s390x with VLAN #3629

Closed ubuntu-server-builder closed 1 year ago

ubuntu-server-builder commented 1 year ago

This bug was originally filed in Launchpad as LP: #1868246

Launchpad details
affected_projects = ['subiquity', 'ubuntu-z-systems', 'initramfs-tools (Ubuntu)']
assignee = None
assignee_name = None
date_closed = 2020-03-26T20:43:01.063189+00:00
date_created = 2020-03-20T10:39:28.255787+00:00
date_fix_committed = None
date_fix_released = None
id = 1868246
importance = undecided
is_complete = True
lp_url = https://bugs.launchpad.net/cloud-init/+bug/1868246
milestone = None
owner = fheimes
owner_name = Frank Heimes
private = False
status = invalid
submitter = fheimes
submitter_name = Frank Heimes
tags = ['id-5e8b82987d7c24699ed2aa56', 'installer', 'req4focal', 's390x']
duplicates = []

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T10:39:28.255787+00:00

I tried today an subiquity LPAR installation using the latest ISO (March 19) that includes the latest 20.03 subiquity. The installation itself completed fine, but after the post-install reboot the system didn't had a network active - please note that the LPAR is connected to a VLAN.

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul t qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: encc000: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default q len 1000
link/ether a2:8d:91:85:12:e3 brd ff:ff:ff:ff:ff:ff
3: enP1p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 82:0c:2d:0c:b8:70 brd ff:ff:ff:ff:ff:ff
4: enP1p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group defaul t qlen 1000
link/ether 82:0c:2d:0c:b8:71 brd ff:ff:ff:ff:ff:ff
5: enP2p0s0: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
link/ether 82:0c:2d:0c:b7:00 brd ff:ff:ff:ff:ff:ff
6: enP2p0s0d1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group defaul t qlen 1000
link/ether 82:0c:2d:0c:b7:01 brd ff:ff:ff:ff:ff:ff

Wanting to have a look at the netplan config it turned out that there is no yaml file: $ ls -l /etc/netplan/
total 0

Adding one manually and applying it worked fine.

So looks like the installer does not properly generate or copy a 01-netcfg.yaml to /etc/netplan.

Please see below the entire steps as well as a compressed file with the entire content of /var/log

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T10:39:28.255787+00:00

Launchpad attachments: subiquity test - focal live 19032020_s390x_LPAR.txt

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T10:40:11.854229+00:00

Launchpad attachments: 20032020_DASD_LPAR.tgz

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T12:11:35.573646+00:00

This issue btw. does not happen to me with a z/VM that is not attached to a VLAN environment. In an non-VLAN env. (here on z/VM) I see this file on the installed system: /etc/netplan/50-cloud-init.yaml with proper content:

$ cat /etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network: ethernets: enc600: addresses:

And the network is working after the post-install reboot.

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T13:04:09.113213+00:00

I just see that this 'could' be a duplicate of: LP 1861460 https://bugs.launchpad.net/bugs/1861460

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-20T14:37:44.769731+00:00

Some additional information:

Early in the subiquity installation process (right after disk device enablement) I can see two files in /etc/netplan/: 00-installer-config.yaml
50-cloud-init.yaml.dist-subiquity

I think both are not as they should be for this VLAN environment.

After replacing them with:

network:
version: 2
renderer: networkd
ethernets:
encc000:
dhcp4: no
dhcp6: no
vlans:
encc000.2653:
id: 2653
link: encc000
addresses: [ 10.245.236.15/24 ]
gateway4: 10.245.236.1
nameservers:
search: [ canonical.com ]
addresses:

I was able to bring up the network (in the subiquity shell) using netplan apply. (I also disabled/enabled 0.0.c000 - but I think it was not needed).

Unfortunately there is still no network online after the post-install reboot:

$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group defaul link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group d efault qlen 1000
link/ether 16:9e:e9:36:c4:90 brd ff:ff:ff:ff:ff:ff
inet6 fe80::149e:e9ff:fe36:c490/64 scope link
valid_lft forever preferred_lft forever
3: encc000.2653@encc000: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueu e state UP group default qlen 1000
link/ether 16:9e:e9:36:c4:90 brd ff:ff:ff:ff:ff:ff
inet6 fe80::149e:e9ff:fe36:c490/64 scope link
valid_lft forever preferred_lft forever

...since the following netplan yaml is in place - which is not correct:

$ cat /etc/netplan/50-cloud-init.yaml

This file is generated from information provided by the datasource. Changes

to it will not persist across an instance reboot. To disable cloud-init's

network configuration capabilities, write a file

/etc/cloud/cloud.cfg.d/99-disable-network-config.cfg with the following:

network: {config: disabled}

network:
ethernets:
encc000: {}
version: 2
vlans:
encc000.2653:
id: 2653
link: encc000
nameservers:
addresses:

Replacing it again by the above (known to work) yaml allows to bring the network up again (with the help of netplan).

I add cloud-init as affected package and let the maintainers decide if this is a duplicate or not (see previous comment).

ubuntu-server-builder commented 1 year ago

Launchpad user Dan Watkins(oddbloke) wrote on 2020-03-23T16:56:44.005194+00:00

So it looks to me like the network config that cloud-init ends up with is:

{'ethernets': {'encc000': {'match': {'macaddress': 'b2:a0:38:23:63:93'}, 'nameservers': {'addresses': ['10.245.236.1']}, 'set-name': 'encc000.2653'}}, 'version': 2, 'vlans': {'encc000.2653': {'addresses': ['10.245.236.15/24'], 'gateway4': '10.245.236.1', 'id': 2653, 'link': 'encc000', 'nameservers': {'addresses': ['10.245.236.1'], 'search': ['canonical.com']}}}}

which looks incorrect because b2:a0:38:23:63:93 isn't the MAC address of any interface in the system AFAICT. (I also wonder if set-name'ing the encc000 ethernet to the name of the vlan would cause/is causing problems.)

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-23T17:22:03.618691+00:00

Hi Dan, please notice that MAC addresses may change on s390x systems - on such systems they are not that unique as you know from other platforms...

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2020-03-25T02:23:07.899930+00:00

I admit to being quite confused, but I think this is probably in some sense a duplicate of the bug Frank linked. Frank, did you configure the networking by putting vlan=$whatever on the kernel command line, or do it entirely in subiquity?

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-25T07:37:09.043938+00:00

Yes, I used vlan= in the parmfile, hence passing it over as argument to the kernel. The entire parmfile (that holds all the kernel parameters) is this:

ip=10.245.236.15::10.245.236.1:255.255.255.0:s1lp15:encc000.2653:none:10.245.236.1 vlan=encc000.2653:encc000 url=ftp://installserver:21/ubuntu-live-server-20.04/focal-live-server-s390x.iso http_proxy=http://proxyserver:3128 --- quiet

(sorry, I should have already attached it)

I think that this is (still) needed to make sure that the installer has (in it's early phase) a working network and is able to download the ISO image. This worked in the past for me, and it obviously still works - at least the ISO can be downloaded.

I'm wondering if this network config (that is obviously working, due to the successful ISO download) can just be accepted by subiquity and the network config screen populated wit it?!

During the installation I can partially find two yaml files in /etc/netplan - one from the installer (I think after I ran across the network dialog) screen and one from cloud-init - that's pretty confusing.

And I am not able to use the subiquity UI's network configuration screen to create a network config (yaml) that is similar to the one in comment #5.

The one from comment #5 shows a configuration that is known to work and that one is comparable to the configuration done in the parmfile.

The cloud-init that I mentioned in comment #3 looks a bit odd to me - but I assume that netplan configs can just be done in different ways, still leading to the same result.

One concern is the use of: match: macaddress: 02:28:0b:00:00:53 I don't know why is that used (and needed ?) - especially having in mind that MAC addresses and not necessarily unique on this platform (s390x). Sticking to the interface name (encc000 respectively encc000.2653 in case of VLAN) would be the preferable option.

ubuntu-server-builder commented 1 year ago

Launchpad user Ryan Harper(raharper) wrote on 2020-03-25T17:28:52.690705+00:00

1) The cloud-init task here is duplicate; I'd prefer to drop the task here but I'm not sure what to do bug-wise (can we mark the task only as duplicate?

2) This comment

One concern is the use of: match: macaddress: 02:28:0b:00:00:53 I don't know why is that used (and needed ?) - especially having in mind that MAC addresses and not necessarily unique on this platform (s390x).

s390x is unique in that mac address are not stable; For the rest of the world the MAC is the unique way of identifying what config is associated with a particular interface, moreover, a way to ensure independent of the interface name, that the config is applied from boot to boot.

Please file this issue as a separate cloud-init bug; and in there we can discuss alternatives as well as on which platforms MAC is unstable. ISTR that some s390x did provide stable MAC.

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-03-26T08:37:31.292245+00:00

I guess there is no option to mark just one (affecting) entry as duplicate, but I'm happy to mark the entire Bug as duplicate, since I found/remembered LP 1861460 pretty late - after I've already opened this one (see comment #4).

A ticket was opened for IBM to get MAC addresses stable/unique (across reboots) on s390x too and indeed the firmware was modified, but only for z14 GA 2 and never system - so there is still some legacy (z14 GA1 and older). Nowadays the interface names are based on their underlying physical device/address (here in this case 'c000'), which makes the interface and it's name already pretty unique - since it is not possible to have two devices (in one system / LPAR) with the exact same address.

Btw. I think with the right tooling you can even change MAC addresses on other platforms, of course the intention was always to have MAC addresses stable and unique - but things changed.

I'll mark this now as duplicate and open a separate cloud-init ticket for further discussions ...

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-03-26T20:43:39.395593+00:00

1) cloud-init should be fixed for /run/netplan/* which is another bug 2) this bug is about subiquity not deleting/tearing down critical connections

ubuntu-server-builder commented 1 year ago

Launchpad user Andrew Cloke(andrew-cloke) wrote on 2020-04-01T12:07:25.449709+00:00

Thanks Dimitri, is that cloud-init bug# 1861460 ?

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-04-01T16:20:56.914349+00:00

subiquity has a merge proposal to address this issue, I believe. Should be in edge channel soon.

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-04-01T16:40:11.284601+00:00

That would be fantastic - please let me know if it's in - this will make further testing and usage on LPARs much simpler (and me very happy ;-)

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-04-01T17:24:19.359335+00:00

well, there are multiple pieces at stake here. I.e. has cloud-init completed successfully on the lpar boot? do you have cloud-init logs from the first boot of the system?

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-04-01T17:36:41.839117+00:00

The cloud-init log can be found in the attached tgz in comment #2: https://bugs.launchpad.net/subiquity/+bug/1868246/+attachment/5339311/+files/20032020_DASD_LPAR.tgz (it incl. entire /var/log and /var/crash)

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-04-06T13:58:54.356484+00:00

This is awaiting cloud-init casper initramfs-tools livecd-rootfs pending changes all getting accepted and migrated to release pocket, and new image built using all of the above, before further tests/development can happen.

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-04-07T12:03:39.156748+00:00

When using today's ISO, with only a single disk drive attached, and snap refreshed from edge channel, one should be able to complete the install with networking not getting interrupted, and correctly have networking in the target too.

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-04-07T19:29:08.756731+00:00

I just tried it with edge (updated subiquity manually, since I unfortunately need to setup a proxy first for my LPARs to be able to connect to the snap store, but that due to the infrastructure).

The installation worked fine - I had network at the very first subiquity screen and I could just accept the network as it was at the network config screen (just had to select continue), used a single disk (as suggested) and was able to complete the installation - and hit Reboot at the end.

So everything was fine - except one little thing - and that is that the system came up w/o networking. At the console I found this netplan config:

ubuntu@zLin15:~$ ls -la /etc/netplan/ -rw-r--r-- 1 root root 90 Apr 10 19:17 /etc/netplan/00-installer-config.yaml ubuntu@zlin15:~$ cat /etc/netplan/00 "# This is the network config written by 'subiquity'" network: ethernets: {} version: 2

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2020-04-08T04:33:01.067699+00:00

Oh well one step at a time. Can you extract the /var/log/installer/subiquity-debug.log file from the installed system and attach it to this bug?

I've promoted current edge to the stable/ubuntu-20.04 channel and so testing tomorrow's ISO should be easier.

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-04-08T05:58:53.043350+00:00

I left the system in that state yesterday, now just fixed nw and saved /var/log/installer. Please see attached tgz. Launchpad attachments: inst.tgz

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2020-04-08T08:04:03.624673+00:00

So the issue here is that initramfs-tools is generating this config:

https://paste.ubuntu.com/p/Cww2XkWB8J/

but this is invalid: the value of a key in ethernets has to have some value. What is happening is that cloud-init is failing to parse it and so nothing at all gets written to any netplan directory.

I think this little patch https://paste.ubuntu.com/p/CzGww7htCQ/ to initramfs-tools should fix the issue, but I'd like to test before uploading.

ubuntu-server-builder commented 1 year ago

Launchpad user Dimitri John Ledkov(xnox) wrote on 2020-04-08T20:35:28.738187+00:00

and one more typo

ubuntu-server-builder commented 1 year ago

Launchpad user Michael Hudson-Doyle(mwhudson) wrote on 2020-04-13T20:35:08.672382+00:00

So we believe this bug is now fixed but can you confirm Frank?

ubuntu-server-builder commented 1 year ago

Launchpad user Frank Heimes(fheimes) wrote on 2020-04-14T13:41:55.085536+00:00

Yes, I can confirm that this is fixed now (using image from Apr 14th) - many thx!