canonical / lxd

Powerful system container and virtual machine manager
https://canonical.com/lxd
GNU Affero General Public License v3.0
4.39k stars 929 forks source link

User-defined meta-data can create malformed YAML #13853

Open holmanb opened 3 months ago

holmanb commented 3 months ago

Required information

Issue description

Problem

The user-defined meta-data key gets appended as a string to the lxd-provided meta-data. This means that duplicate keys can be added, which creates a configuration that isn't well defined. Both 1.1 and 1.2 of the YAML spec state that keys are unique, which this violates.

The configuration received by cloud-init:

{'_metadata_api_version': '1.0',
 'config': {'user.meta-data': 'instance-id: test_2'},
 'devices': {'eth0': {'hwaddr': '00:16:3e:e3:ed:2c',
                      'name': 'eth0',
                      'network': 'lxdbr0',
                      'type': 'nic'},
             'root': {'path': '/', 'pool': 'default', 'type': 'disk'}},
 'meta-data': '#cloud-config\n'
              'instance-id: 0b6c31e2-403c-44eb-b610-ad7eafea777e\n'
              'local-hostname: oracular\n'
              'instance-id: test_2'}

Cloud-init's implementation uses PyYAML which happens to use the last defined key - which happens to produce the desired outcome (allow user to override the default meta-data), but it depends on undefined behavior of a specific library. If cloud-init were ever to move to a different YAML library this behavior could break or need to be manually worked around.

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1) cloud-init could be updated to make values in metadata['config']['user.meta-data'] override values in metadata['meta-data']. This wouldn't change cloud-init's current behavior, which ignores the values in metadata['config']. We could optionally check for a bump to the value in _metadata_api_version before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2) Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the #cloud-config comment too. This isn't necessary because meta-data isn't part of cloud-config.

https://github.com/canonical/cloud-init/issues/5575

Information to attach

Resources: Processes: 69 CPU usage: CPU usage (in seconds): 6 Memory usage: Memory (current): 83.53MiB Swap (current): 28.00KiB Network usage: eth0: Type: broadcast State: UP Host interface: vethd9b8b75f MAC address: 00:16:3e:9a:8b:f6 MTU: 1500 Bytes received: 115.82kB Bytes sent: 5.29kB Packets received: 454 Packets sent: 52 IP addresses: inet: 10.161.80.194/24 (global) inet6: fd42:80e2:4695:1e96:216:3eff:fe9a:8bf6/64 (global) inet6: fe80::216:3eff:fe9a:8bf6/64 (link) lo: Type: loopback State: UP MTU: 65536 Bytes received: 404B Bytes sent: 404B Packets received: 4 Packets sent: 4 IP addresses: inet: 127.0.0.1/8 (local) inet6: ::1/128 (local)

Log:

lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.855 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194228.857 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.782 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194243.795 ERROR attach - ../src/src/lxc/attach.c:lxc_attach_run_command:1841 - No such file or directory - Failed to exec "user.meta-data" lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194325.518 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801194417.803 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801195046.604 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:165 - newuidmap binary is missing lxc cloudinit-0801-1919380a56vdl6 20240801201625.883 WARN idmap_utils - ../src/src/lxc/idmap_utils.c:lxc_map_ids:171 - newgidmap binary is missing

 - [x] Container configuration (`lxc config show NAME --expanded`)
```yaml
architecture: x86_64
config:
  image.architecture: x86_64
  image.description: Ubuntu 20.04 LTS server (20240730)
  image.os: ubuntu
  image.release: focal
  limits.cpu.allowance: 50%
  user.meta-data: 'instance-id: test_2'
  volatile.base_image: c19cc6a8469b596aae092a3953e326ed01e1183a25bff1d26145a85a2272767e
  volatile.cloud-init.instance-id: 7d26c435-da56-405c-9b04-9ad98f550736
  volatile.eth0.host_name: vethd9b8b75f
  volatile.eth0.hwaddr: 00:16:3e:9a:8b:f6
  volatile.idmap.base: "0"
  volatile.idmap.current: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.idmap.next: '[{"Isuid":true,"Isgid":false,"Hostid":1000000,"Nsid":0,"Maprange":1000000000},{"Isuid":false,"Isgid":true,"Hostid":1000000,"Nsid":0,"Maprange":1000000000}]'
  volatile.last_state.idmap: '[]'
  volatile.last_state.power: RUNNING
  volatile.last_state.ready: "false"
  volatile.uuid: a097111b-15e4-45e4-aa31-a6da707012a8
  volatile.uuid.generation: a097111b-15e4-45e4-aa31-a6da707012a8
devices:
  eth0:
    name: eth0
    network: lxdbr0
    type: nic
  root:
    path: /
    pool: default
    type: disk
ephemeral: false
profiles:
- default
stateful: false
description: ""
tomponline commented 3 months ago

Hi @holmanb

I'm afraid im not really following what it is that LXD needs to change here?

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

holmanb commented 3 months ago

Thanks for the response @tomponline!

I'm afraid im not really following what it is that LXD needs to change here?

This is the offending line. See the commit on this branch for the change that I am proposing.

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Also, not sure if relevant, but using the user.* prefix is deprecated for cloud-init config and the current support keys start with cloud-init., see https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-options-cloud-init

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

$ lxc launch ubuntu:noble me -c cloud-init.meta-data=instance-'id: test_1'
Creating me
Error: Failed instance creation: Failed creating instance record: Unknown configuration key: cloud-init.meta-data
$ lxc launch ubuntu:noble me -c user.meta-data=instance-'id: test_1'
Creating me
Starting me                               

similarly:

$ lxc config set me user.meta-data=instance-'id: test_1'    
$ lxc config set me cloud-init.meta-data=instance-'id: test_1'
Error: Invalid config: Unknown configuration key: cloud-init.meta-data

If you want to deprecate the user.meta-data key as well for uniformity I could potentially make cloud-init support a new cloud-init.meta-data key while making this change. Let me know.

tomponline commented 3 months ago

I'm happy to submit a PR for this, but we need to release a change in cloud-init first to accommodate this expectation. This is why I filed a bug report rather than just a PR - I want to make sure that the proposed solution is acceptable before moving forward it.

Thanks!

Will this break users of LXD guests with older versions of cloud-init?

tomponline commented 3 months ago

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

tomponline commented 3 months ago

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

tomponline commented 3 months ago

In order to preserve the current behavior while creating a path to using standard-compliant yaml while preserving backwards compatibility, we could do the following:

1. cloud-init could be updated to make values in `metadata['config']['user.meta-data']` override values in `metadata['meta-data']`. This wouldn't change cloud-init's current behavior, which ignores the values in `metadata['config']`. We could optionally check for a bump to the value in `_metadata_api_version` before doing this, but this wouldn't be strictly required since this is functionally identical currently.

2. Once stable distributions have this update, we could update the api to no longer append user meta-data to the default metadata (and bump the meta-data api, if desired). While we're making this change, we might want to drop the `#cloud-config` comment too. This isn't necessary because meta-data isn't part of cloud-config.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

holmanb commented 3 months ago

Will this break users of LXD guests with older versions of cloud-init?

This would break any user that provides a custom instance-id (duplicate key) on an older version of cloud-init, since this would cause cloud-init to see the old key where it didn't before.

From a cloud-init perspective, fixes for bugs come in new releases so the typical stability / support recommendation is "upgrade to the latest version". If we want to avoid breaking old instances, I could probably update the proposal I made above to increment the api rev number.

The relevant key is user.meta-data, which wasn't actually deprecated (despite being used for exposing information to cloud-init just like the others):

Hrm, that is curious, I wasn't expecting that, but I'd need to dig into the commit history and original pull requests to try and understand why this wasn't originally changed to have a cloud-init. prefix like the other keys, as it seems to be like it should.

Agreed. Let me know if you'd like to go that route.

This isn't necessary because meta-data isn't part of cloud-config.

Please could you explain this statement. I'm confused why a key being used by cloud-init isn't part of cloud-config?

Cloud-config isn't required for any of the keys: vendor-data, user-data, or meta-data.

Cloud-config is just one of cloud-init's configuration formats. There are several configuration format options available for user-data and vendor-data, including cloud-config, and even just running a shell script:

config:
...
  user.user-data: |
    #!/usr/bin/bash
    echo hello | tee -a /tmp/example.txt

With the above example a user would see:

$ lxc exec me -- cat /tmp/example.txt
hello

User-data is provided by the user for the purpose of configuring an instance. Vendor-data is likewise intended to by provided by the cloud/vendor for the purpose of configuring an instance with cloud-specific information. Both vendor-data and user-data can be any of the multiple configuration formats mentioned above.

Meta-data doesn't follow any of the above formats, and is not intended to be a configuration format for the instance. Instead, it supposed to tell cloud-init just a few pieces of information about the instance: its instance_id, region, etc. The lines are blurred a bit because a couple of the keys that it supports overlap with cloud-config. One of the overlapping keys is local-hostname, which is used by lxd and probably adds to the confusion here. Neither key is defined in cloud-init's cloud-config schema.

I suspect we'll need option 1. at least, and then potentially land the proposed changed in 2. for only the 6.x series of LXD.

That sounds fine by me. Let me know if my responses here or further digging revealed anything new that suggest that we shouldn't go forward with this proposal. This PR is my proposal to option 1, if you'd like to take a look.

tomponline commented 3 months ago

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

holmanb commented 2 months ago

@holmanb Hi, would you mind booking a meeting to discuss this issue? Thanks

I just saw this when checking back on the status of this. I'd be happy to.

tomponline commented 1 month ago

Thanks for the call @holmanb

As discussed, you can change the instance-id exposed to cloud-init via LXD's devlxd metadata API (https://documentation.ubuntu.com/lxd/en/latest/dev-lxd/#meta-data) by changing volatile.cloud-init.instance-id see:

https://documentation.ubuntu.com/lxd/en/latest/reference/instance_options/#instance-volatile:volatile.cloud_init.instance-id

To change local-hostname rename the instance.

I also think we should entirely remove the user.meta-data key from LXD's code base, as it is currently undocumented and appears to have been due to be removed in LXD 4.21 but was not, apparently due to an oversight:

I believe there is also a user.meta-data config key which is tied to cloud-init. Did we just forget to mention it here and in the issue, or must this remain as user.meta-data?

We will not keep that configuration key moving forward. It’s always been a very odd one with no real use cases, so it will just go away completely.

https://discuss.linuxcontainers.org/t/lxd-first-class-cloud-init-support/12559/18

See also https://discuss.linuxcontainers.org/t/lxd-4-21-has-been-released/12860#reworked-cloud-init-support-4

Removed from docs here:

As far as I understood, it's because there's no reason for using it - the user.meta-data was originally added to set the instance name, but that isn't necessary anymore (and also doesn't work).

https://github.com/canonical/lxd/pull/11433#discussion_r1124155114

There is also an issue confirming its removal here (although there's some confusion between user.user-data and user.meta-data in that thread):

https://github.com/canonical/lxd/issues/10417

holmanb commented 1 month ago

Thanks @tomponline for discussing. The volatile key and instance rename should meet our needs.

Cloud-init has one test which I recently added which depends on setting the instance ID via the user.meta-data key. I will update that to use the volatile key later today; it is a trivial change.

I just submitted a PR against cloud-init to update cloud-init's lxd documentation per our conversation.

blackboxsw commented 1 month ago

@holmanb @tomponline we have a second use case for the user of user.meta-data in integration testing of lxd which allows cloud-init to inject default SSH public-keys configuration into all images launched in a profile without colliding or being overwritten with cloud-init.user-data provided to a system at launch. This now undocumented feature which LXD provides in user.meta-data is reminiscent of the behavior that clouds like Azure, ec2, openstack have which allows project owners or teams to set per-project ssh-public-keys that are authorized for SSH into those vms. If user.meta-data goes away, then minimally integration test runners for Ubuntu Pro and cloud-init will force those tests requiring SSH to use cloud-init.user-data or cloud-init.vendor-data to setup such authorized keys.

If the ability to set user.meta-data disappears in the future, I wonder whether there should be a feature-request instead for lxc config key public-ssh-keys within profile configuration, such a feature would be easy to plumb through to cloud-init based images via 1.0/meta-data in devlxd, but likely complex for images without cloud-init.

holmanb commented 1 month ago

@blackboxsw thanks for catching that, I didn't catch that platform ssh keys can be provided in meta-data.

For completeness I double checked the other potential users of meta-data. Here are the references to arbitrary meta-data keys that I see in cloudinit/sources/__init__.py:

cloud-name - allows the cloud to define its own cloud-id at runtime launch-index - used by clouds that need user-data filtered by launch-index (ec2) availability-zone / availability_zone / placement - used for setting mirrors by region (ec2 and some other clouds)

None of these appear to be used by our LXD datasource code, nor by any of our tests in cloud-init or pycloudlib, so it looks to me like public-ssh-keys is the only requirement blocking lxd from removing user.meta-data.

@tomponline My apologies, I missed this requirement. It seems that user.meta-data is still needed by cloud-init for the time being.

As @blackboxsw suggested, if lxd were to provide the ssh key some other way (such as with a new key volatile.cloud_init.public-keys similar to the volatile.cloud_init.instance-id key), then cloud-init could switch to use that and stop using the user.meta-data key.

tomponline commented 1 month ago

what form would volatile.cloud_init.public-keys take?

holmanb commented 1 month ago

what form would volatile.cloud_init.public-keys take?

Regarding applying these settings, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable). If I understand correctly, both of these expectations are true for the other volatile keys.

Regarding the datatype, it would be best if this key could contain both string and list of strings. That will ensure that users can continue inserting either a single key or multiple public keys. If only one datatype is preferred, that would probably be fine too (as just a list of strings) but would require some changes in pycloudlib to accomidate.

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

blackboxsw commented 1 month ago

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If the value were a string that contained newlines between ssh public keys, that'd work just fine and cloud-init will call splitlines() on that value.

For example the following multi-line string value would allow cloud-init import my two public keys

ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDSL7uWGj8cgWyIOaspgKdVy0cKJ+UTjfv7jBOjG2H/GN8bJVXy72XAvnhM0dUM+CCs8FOf0YlPX+Frvz2hKInrmRhZVwRSL129PasD12MlI3l44u6IwS1o/W86Q+tkQYEljtqDOo0a+cOsaZkvUNzUyEXUwz/lmYa6G4hMKZH4NBj7nbAAF96wsMCoyNwbWryBnDYUr6wMbjRR1J9Pw7Xh7WRC73wy4Va2YuOgbD3V/5ZrFPLbWZW/7TFXVrql04QVbyei4aiFR5n//GvoqwQDNe58LmbzX/xvxyKJYdny2zXmdAhMxbrpFQsfpkJ9E/H5w0yOdSvnWbUoG5xNGoOB csmith@fringe # ssh-import-id lp:chad.smith
ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDvl3VfPjVXsXBsm6r2J+UneIMr4ZOJhQlXuBWTwzexbd/XugB3k5EXA18yyqjEVT+bApVwlxATY66drVUPBuZ2JMU1HuLOKhG6toZd7j042oV5b2TEvg0es9qxs9mtGzvMPf3mB3tBVY/ESall023M+J5JjGGSO4J3zM/9c+P3Hs7xyCjAoySZDN2VZzscPgSGZzck8xtyO39uPfscKXi9LJkkhDDG6SVWie5OeM8TxyH2W2eNDKeXid/qgdIxqRLSYiNnWpt9htI0SzahnFYtsw9VLkij+0cM29lBIGUr5AehN2Y6jetxODR3pZt4YqOiyC6D5NaEsVGKOb0zjIBBCso6mIseejlOwocSYUH21YnLDS2Mu31bHRmPjpRvMVTOFtnS2OkfOxYTyMNFZ5PH/a0/t3DGxZZqz74F+APxG1X0vsgSFA9yYzbBaY3fr3vNAEYsRMTeBIjF6Gx6QmX3/kw5KBid4t8qQCV4Z1l8UmWZu4qFYxV/Z0IYPZazgYy/1W0qfRm5AdvpDdH9XArIokwqe1E2Djp5/xWp4Z9dAINmfJvNZxiDJk7gQz+Hdka/1U/f3wQSds9OAjF+a94Lj+F9CmMrhpVEZG5OL8ysK4iwSOsDhW7iLeZw5AO7cVhDUWj53/p2FP4+zxin/tYkDhNTJF0Nhc2uLMLxRCOGrQ== csmith@uptown # ssh-import-id lp:chad.smith

Simplistically, I'm imagining something like this

--- a/lxd/devlxd.go
+++ b/lxd/devlxd.go
@@ -145,7 +145,7 @@ func devlxdMetadataGetHandler(d *Daemon, inst instance.Instance, w http.Response

        value := inst.ExpandedConfig()["user.meta-data"]

-       return response.DevLxdResponse(http.StatusOK, fmt.Sprintf("#cloud-config\ninstance-id: %s\nlocal-hostname: %s\n%s", inst.CloudInitID(), inst.Name(), value), "raw", inst.Type() == instancetype.VM)
+       return response.DevLxdResponse(http.StatusOK, fmt.Sprintf("#cloud-config\ninstance-id: %s\nlocal-hostname: %s\n%s%s", inst.CloudInitID(), inst.Name(), inst.CloudInitPublicKeys(), value), "raw", inst.Type() == instancetype.VM)
 }

 var devlxdEventsGet = devLxdHandler{
diff --git a/lxd/instance/drivers/driver_common.go b/lxd/instance/drivers/driver_common.go
index 9d547032a8..adc124451e 100644
--- a/lxd/instance/drivers/driver_common.go
+++ b/lxd/instance/drivers/driver_common.go
@@ -171,6 +171,14 @@ func (d *common) CloudInitID() string {
        return d.name
 }

+// CloudInitPublicKeys returns a string containing a new-line separated list of SSH authorized keys to configure for an instance
+func (d *common) CloudInitPublicKeys() string {
+       id := d.LocalConfig()["volatile.cloud-init.public-keys"]
+       if id != "":
+            id = fmt.Sprintf("public-keys: %s\n", id)
+       return id
+}
+
 // Location returns instance's location.
 func (d *common) Location() string {
        return d.node
holmanb commented 1 month ago

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

blackboxsw commented 1 month ago

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

holmanb commented 1 month ago

Related: I just submitted a PR because volatile.cloud_init.instance-id should actually be volatile.cloud-init.instance-id.

tomponline commented 1 month ago

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

So LXD config options have no concept of "list of strings", all config options are a single string.

However they can contain commas, newlines etc, so depending on the expected content of the string, selecting an appropriate delimiter is important (i.e can commas appear in SSH keys?).

If the format is well understood by LXD, then we can validate it, split it and deliver it to cloud-init in the desired format (i.e list of strings).

I would like avoid having an undefined blob of data like the current meta-data setting is as it leads to the issues we've found where the format is not well understood in all situations.

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

Sounds good.

, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable).

volatile keys can only be set on the instance, not the profile, so I would suggest adding a new proper config key, such as security.ssh-keys.

Interestingly we've recently received a request for something similar from elsewhere in Canonical, albeit it without requiring cloud-init (LXD would set up the SSH keys in the guest). So if we did add a config key like this, and LXD set up the keys directly, would this data even need to be exported to cloud-init's metadata?

Ofcourse we could do both, but would they then potentially conflict?

cc @mionaalex

holmanb commented 1 month ago

No need to support two format types for this content if it adds unnecessary complexity to lxd for a volatile.cloud-init.public-keys setting. cloud-init public-key processing currently supports strings, lists, dicts, sets as values for public-keys provided by cloud platform meta-data. If it were a string that contained newlines between keys, that'd work just fine and cloud-init will call splitlines() on that value.

I prefer if we can avoid receiving structured inputs which then require additional parsing. Unnecessary parsing inevitably leads to bugs and introduces corner cases. If a single format type is preferred, I would lean slightly towards a list of strings. This would benefit not just cloud-init with a simpler implementation, but also users because of more correct validation of inputs.

List of strings sounds good and can easily be validated in lxd/instance/drivers/driver_common.go and be presented as YAML list expected by cloud-init's DataSourceLXD meta-data processing of public-keys. I also note that the leading #cloud-config in existing lxd devlxdMetadataGetHandler doesn't need the leading #cloud-config (as it's not cloud-config and the comment header line is ignored for meta-data anyway).

So LXD config options have no concept of "list of strings", all config options are a single string.

Good to know.

However they can contain commas, newlines etc, so depending on the expected content of the string, selecting an appropriate delimiter is important (i.e can commas appear in SSH keys?).

If the format is well understood by LXD, then we can validate it, split it and deliver it to cloud-init in the desired format (i.e list of strings).

That could work, and I think as @blackboxsw suggested new line delimited would be fine. I think that we would just want to avoid passing empty strings, which might get introduced for example when a user uses two newlines between keys rather than one newline.

I would like avoid having an undefined blob of data like the current meta-data setting is as it leads to the issues we've found where the format is not well understood in all situations.

Agreed

Regarding the upgrade path, we could make cloud-init and pycloudlib fall back to setting user.meta-data in the event that setting volatile.cloud_init.public-keys fails. This would bridge the gap between old and new versions for seamless rollout.

Sounds good.

, we would need to be able to be set this value in a profile (preferred) or on an instance before launch (less preferred, but workable).

volatile keys can only be set on the instance, not the profile, so I would suggest adding a new proper config key, such as security.ssh-keys.

Sounds good. As described above, it wouldn't be used internally anyways so volatile doesn't make as much sense.

Interestingly we've recently received a request for something similar from elsewhere in Canonical, albeit it without requiring cloud-init (LXD would set up the SSH keys in the guest). So if we did add a config key like this, and LXD set up the keys directly, would this data even need to be exported to cloud-init's metadata?

Ofcourse we could do both, but would they then potentially conflict?

cc @mionaalex

I think that it would probably be preferred if we can exercise this code path in cloud-init using LXD, but I do think that doing both would conflict. Maybe we could get away with just testing this functionality on other clouds. @blackboxsw thoughts?

tomponline commented 1 month ago

I think that it would probably be preferred if we can exercise this code path in cloud-init using LXD, but I do think that doing both would conflict. Maybe we could get away with just testing this functionality on other clouds. @blackboxsw thoughts?

Why are there multiple ways of setting up SSH keys in cloud-init via both meta-data and user-data?

tomponline commented 1 month ago

@blackboxsw does cloud-init apply the existing user.meta-data on every boot or only first boot?