canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.32k stars 928 forks source link

User-defined meta-data can create malformed YAML #13853

Open holmanb opened 1 month ago

holmanb commented 1 month ago

Required information

Issue description

Problem

The user-defined meta-data key gets appended as a string to the lxd-provided meta-data. This means that duplicate keys can be added, which creates a configuration that isn't well defined. Both 1.1 and 1.2 of the YAML spec state that keys are unique, which this violates.

The configuration received by cloud-init:

{'_metadata_api_version': '1.0',
 'config': {'user.meta-data': 'instance-id: test_2'},
 'devices': {'eth0': {'hwaddr': '00:16:3e:e3:ed:2c',
                      'name': 'eth0',
                      'network': 'lxdbr0',
                      'type': 'nic'},
             'root': {'path': '/', 'pool': 'default', 'type': 'disk'}},
 'meta-data': '#cloud-config\n'
              'instance-id: 0b6c31e2-403c-44eb-b610-ad7eafea777e\n'
              'local-hostname: oracular\n'
              'instance-id: test_2'}

Cloud-init's implementation uses PyYAML which happens to use the last defined key - which happens to produce the desired outcome (allow user to override the default meta-data), but it depends on undefined behavior of a specific library. If cloud-init were ever to move to a different YAML library this behavior could break or need to be manually worked around.

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1) cloud-init could be updated to make values in metadata['config']['user.meta-data'] override values in metadata['meta-data']. This wouldn't change cloud-init's current behavior, which ignores the values in metadata['config']. We could optionally check for a bump to the value in _metadata_api_version before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2) Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the #cloud-config comment too. This isn't necessary because meta-data isn't part of cloud-config.

https://github.com/canonical/cloud-init/issues/5575

Information to attach

Resources: Processes: 69 CPU usage: CPU usage (in seconds): 6 Memory usage: Memory (current): 83.53MiB Swap (current): 28.00KiB Network usage: eth0: Type: broadcast State: UP Host interface: vethd9b8b75f MAC address: 00:16:3e:9a:8b:f6 MTU: 1500 Bytes received: 115.82kB Bytes sent: 5.29kB Packets received: 454 Packets sent: 52 IP addresses: inet: 10.161.80.194/24 (global) inet6: fd42:80e2:4695:1e96:216:3eff:fe9a:8bf6/64 (global) inet6: fe80::216:3eff:fe9a:8bf6/64 (link) lo: Type: loopback State: UP MTU: 65536 Bytes received: 404B Bytes sent: 404B Packets received: 4 Packets sent: 4 IP addresses: inet: 127.0.0.1/8 (local) inet6: ::1/128 (local)

Log:

lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.795 ERROR attach - ../src/src/lxc/attach.c:lxc_attach_run_command:1841 - No such file or directory - Failed to exec "user.meta-data" lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing

 - [x] Container configuration (`lxc config show NAME --expanded`)
```yaml
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Ubuntu 20.04 LTS server (20240730)
  image.os: ubuntu
  image.release: focal
  limits.cpu.allowance: 50%
  user.meta-data: 'instance-id: test_2'
  volatile.base_image: c19cc6a8469b596aae092a3953e326ed01e1183a25bff1d26145a85a2272767e
  volatile.cloud-init.instance-id: 7d26c435-da56-405c-9b04-9ad98f550736
  volatile.eth0.host_name: vethd9b8b75f
  volatile.eth0.hwaddr: 00:16:3e:9a:8b:f6
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: a097111b-15e4-45e4-aa31-a6da707012a8
  volatile.uuid.generation: a097111b-15e4-45e4-aa31-a6da707012a8
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
tomponline commented 1 month ago

Hi @holmanb

I'm afraid im not really following what it is that LXD needs to change here?

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

holmanb commented 1 month ago

Thanks for the response @tomponline!

I'm afraid im not really following what it is that LXD needs to change here?

This is the offending line. See the commit on this branch for the change that I am proposing.

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

$ lxc launch ubuntu:noble me -c cloud-init.meta-data=instance-'id: test_1'
Creating me
Error: Failed instance creation: Failed creating instance record: Unknown configuration key: cloud-init.meta-data
$ lxc launch ubuntu:noble me -c user.meta-data=instance-'id: test_1'
Creating me
Starting me                               

similarly:

$ lxc config set me user.meta-data=instance-'id: test_1'    
$ lxc config set me cloud-init.meta-data=instance-'id: test_1'
Error: Invalid config: Unknown configuration key: cloud-init.meta-data

If you want to deprecate the user.meta-data key as well for uniformity I could potentially make cloud-init support a new cloud-init.meta-data key while making this change. Let me know.

tomponline commented 1 month ago

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Thanks!

Will this break users of LXD guests with older versions of cloud-init?

tomponline commented 1 month ago

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

tomponline commented 1 month ago

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

tomponline commented 1 month ago

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1. cloud-init could be updated to make values in `metadata['config']['user.meta-data']` override values in `metadata['meta-data']`. This wouldn't change cloud-init's current behavior, which ignores the values in `metadata['config']`. We could optionally check for a bump to the value in `_metadata_api_version` before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the `#cloud-config` comment too. This isn't necessary because meta-data isn't part of cloud-config.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

holmanb commented 1 month ago

Will this break users of LXD guests with older versions of cloud-init?

This would break any user that provides a custom instance-id (duplicate key) on an older version of cloud-init, since this would cause cloud-init to see the old key where it didn't before.

From a cloud-init perspective, fixes for bugs come in new releases so the typical stability / support recommendation is "upgrade to the latest version". If we want to avoid breaking old instances, I could probably update the proposal I made above to increment the api rev number.

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

Agreed. Let me know if you'd like to go that route.

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

Cloud-config isn't required for any of the keys: vendor-data, user-data, or meta-data.

Cloud-config is just one of cloud-init's configuration formats. There are several configuration format options available for user-data and vendor-data, including cloud-config, and even just running a shell script:

config:
...
  user.user-data: |
    #!/usr/bin/bash
    echo hello | tee -a /tmp/example.txt

With the above example a user would see:

$ lxc exec me -- cat /tmp/example.txt
hello

User-data is provided by the user for the purpose of configuring an instance. Vendor-data is likewise intended to by provided by the cloud/vendor for the purpose of configuring an instance with cloud-specific information. Both vendor-data and user-data can be any of the multiple configuration formats mentioned above.

Meta-data doesn't follow any of the above formats, and is not intended to be a configuration format for the instance. Instead, it supposed to tell cloud-init just a few pieces of information about the instance: its instance_id, region, etc. The lines are blurred a bit because a couple of the keys that it supports overlap with cloud-config. One of the overlapping keys is local-hostname, which is used by lxd and probably adds to the confusion here. Neither key is defined in cloud-init's cloud-config schema.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

That sounds fine by me. Let me know if my responses here or further digging revealed anything new that suggest that we shouldn't go forward with this proposal. This PR is my proposal to option 1, if you'd like to take a look.

tomponline commented 3 weeks ago

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks