canonical / multipass

Multipass orchestrates virtual Ubuntu instances
https://multipass.run
GNU General Public License v3.0
7.83k stars 650 forks source link

[launch] allow cloud-init file injection #1040

Open Saviq opened 5 years ago

Saviq commented 5 years ago

Some cloud-init modules (e.g. scripts) require files to be placed into the data source.

We need a --cloud-init-files or so that will allow the user to specify a file/directory tree to be included on the data source.

v1k0d3n commented 4 years ago

would love to see this implemented. where does this currently fall on the release todo list @Saviq? i can see this being extremely useful for some of our use cases.

Saviq commented 4 years ago

Hey @v1k0d3n I don't have good news about this, either ;)

It's not something on our immediate roadmap, but it should be relatively easy to add if you wanted to contribute.

The cloud-init ISO is created here, in the daemon.cpp code - it would be easy to feed more files in.

v1k0d3n commented 4 years ago

@Saviq more bad news!? agh. :) j/k

when I have some free time, I'll see if it's something I can knock out. it's at least helpful to know that it's not on product's immediate radar, but I'm surprised more users aren't asking for the feature. this feature seems like a great way to differentiate multipass vs. containerized use cases.

surahman commented 3 years ago

EDIT: It appears that the make_cloud_init_image may have been deprecated in the newer code base, is this issue resolved?

I am trying to setup a plan to resolve this, here is what I have:

  1. Add position argument --cloud-init-files to the launch command.
  2. Update the help message.
  3. Add an argument with the scripts path (default empty string) value to: https://github.com/canonical/multipass/blob/b22ff56651c225d2e9cb219703a99f8ac6a21d0c/src/daemon/daemon.cpp#L147-L148

What should the record structure be for the scripts directory for the cloud-init YAML file? What structure/how should I be adding the scripts to the ISO?

Saviq commented 3 years ago

Hi @surahman, my current idea for this would be a --cloud-init-tree argument that would take a path as the argument and that would be the entire file structure injected into the ISO.

It would be a double-edged sword as I think we wouldn't override network-config or vendor-data if it existed, so users could potentially break some of Multipass assumptions. I was thinking we'd have a way to generate a "pristine" tree for users to modify, with comments to what each of the entries are responsible for.

surahman commented 3 years ago

my current idea for this would be a --cloud-init-tree argument that would take a path as the argument and that would be the entire file structure injected into the ISO.

You would want to read in the cloud-init YAML file from disk?

It would be a double-edged sword as I think we wouldn't override network-config or vendor-data if it existed, so users could potentially break some of Multipass assumptions.

In this case, would you want to flag the YAML file as invalid with an error indicating it is overriding preset values? Or, assuming the interface to update these data structures is exposed, would you want to update them with a warning indicating preset values are being updated?

I was thinking we'd have a way to generate a "pristine" tree for users to modify, with comments to what each of the entries are responsible for.

Would you like to provide a template YAML file with the possible valid entries for initialization? Then the end-users would visit the documentation and grab the template. I am still very new to using cloud-init and am learning it now, but I do have experience with AWS instances and the user-data used to configure them - this is pretty similar.

Saviq commented 3 years ago

You would want to read in the cloud-init YAML file from disk?

We wouldn't even read it as YAML, just build the ISO with files from the folder given.

In this case, would you want to flag the YAML file as invalid with an error indicating it is overriding preset values? Or, assuming the interface to update these data structures is exposed, would you want to update them with a warning indicating preset values are being updated?

I don't want to go too deep, but rather give users the tool to do with as they please. So if they provide vendor-data, we would accept it as-is.

Would you like to provide a template YAML file with the possible valid entries for initialization? Then the end-users would visit the documentation and grab the template. I am still very new to using cloud-init and am learning it now, but I do have experience with AWS instances and the user-data used to configure them - this is pretty similar.

Our config is dynamic depending on how the instance is launched. So we'd need a special multipass command that would take the same arguments launch does, but rather than launch an instance, it would populate a folder with the metadata as generated by Multipass and allow for editing. Another option would be to have a --edit-cloud-init option or so to launch, which would generate the tree and then ask the user to modify and let the launch continue then.

surahman commented 3 years ago

Cool, let's build it 😉. Need to do some research and learning but I am game.

surahman commented 3 years ago

--cloud-init-tree

I think this would need to be a PR on its own, but it seems simple enough. It should be possible to recursively map a directory using QDir and generate a navigable list of files. I would start by mocking the QDir routines need for these operations. I found the ISO generation code in this routine: https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/platform/backends/shared/base_virtual_machine_factory.cpp#L27

The only problem is I do not know what the actual file structure of the ISO would be, an example would be helpful. I have found the VM description, which is what I assume are the only items that need to be populated from the file structure: https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/include/multipass/virtual_machine_description.h#L35-L53

I do not think I will need to edit anything in https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/iso/cloud_init_iso.cpp


Customize cloud-init I think this should be a separate PR. Once the above command is completed, I feel I would have a better understanding of the expected ISO file structure. I do not think it will be too difficult to generate the files and directories using QFile and QDir at that point. I feel that it would be better to populate a folder and let the end-user use an editor (GUI or cmdl) to edit - let them pick their poison.

Saviq commented 3 years ago

--cloud-init-tree

I think this would need to be a PR on its own, but it seems simple enough.

Totally a separate PR, yes. There is a caveat. We've been taking shortcuts in terms of client and daemon sitting on the same machine, and having access to the same files. Ideally we'd not do that here and instead stream the tree from the client to the daemon through our RPC. This would ensure that client and daemon sitting on different machines would be catered for.

It should be possible to recursively map a directory using QDir and generate a navigable list of files. I would start by mocking the QDir routines need for these operations. I found the ISO generation code in this routine:

The only problem is I do not know what the actual file structure of the ISO would be, an example would be helpful. I have found the VM description, which is what I assume are the only items that need to be populated from the file structure:

You can mount /dev/cdrom in a Multipass instance to see what we generate at the moment. cloud-init docs have more details.

I do not think I will need to edit anything in https://github.com/canonical/multipass/blob/877f5454c13fd343f8642cfbdf94752687fd8897/src/iso/cloud_init_iso.cpp

Indeed, that's generic enough. Probably shouldn't even say "cloud-init" there.

Customize cloud-init I think this should be a separate PR. Once the above command is completed, I feel I would have a better understanding of the expected ISO file structure. I do not think it will be too difficult to generate the files and directories using QFile and QDir at that point. I feel that it would be better to populate a folder and let the end-user use an editor (GUI or cmdl) to edit - let them pick their poison.

Indeed, separate PR, with a similar caveat as above - it would ideally be streamed to the client over RPC. And yeah, I envision a "You can now edit the cloud-init data under . Press [Enter] when ready". After that it would behave exactly the same as if they passed --cloud-init-tree in the first place.

surahman commented 3 years ago

I am on it :construction_worker_man:. I am going to run some tests with QT before starting work to figure out the file ops before I start to flesh things out on the daemon. I will figure out getting the data to the client via RPC.

Saviq commented 3 years ago

@surahman that's awesome, you deserve a 🥇.

Will you please write up a summary (a spec of sorts) of the work you're planning to do? This way we can iron out any kinks as early as possible.

surahman commented 3 years ago

@surahman that's awesome, you deserve a 🥇. Happy to contribute :wink:

Will you please write up a summary (a spec of sorts) of the work you're planning to do? This way we can iron out any kinks as early as possible.

I will get this done as soon as possible and before I write any actual code.

surahman commented 3 years ago

RFC: Expected plan of action:

Feature: --cloud-init-tree

Enhancement of the launch command on the client which takes a path as a parameter. The path points to a directory that contains a nested directory-file structure that will comprise the ISO's image to use for cloud-init.

CLI

Add the positional argument cloud-init-tree to the parse_args routine in launch. Ensure there is a value for the path passed and that cloud-init is not set.

The checked items above should be able to reuse code in the current setup code in platform using mp::BaseVirtualMachineFactory::configure as a template to generate the ISO. The unchecked items are a little tricky, and it might be better to get the currently supported items working first. Most of the unsupported items are flat files and can be transferred without conversion as bytes arrays (max size 232 bytes).

RPC LaunchRequest

Transferring data for some of the checked items above to the daemon will require extending the LauchRequest RPC. The unchecked items above will require entending the ISO generation in BaseVirtualMachineFactory.

We could use a map<string, bytes> in gRPC to store the directory/path/to/filename as the key and file contents as the value. This data would then be unpacked and populated on the platform backend into the ISO. This would be dynamic with regards to the files available.

Mocking fileops

If we are sticking to the currently supported user-data, meta-data, vendor-data, and network-config, I do not think we will need to be extending the mocking framework for file ops. The above file names would be placed in a vector and cycled through with attempts to open. Upon open failure (not found or cannot open) they will be skipped.

If there is a need to recursively explore the directory tree we will need mocks for QDirIterator methods to list the contents and check for files.

Questions

Why convert any of these files to YAML, etc. after reading? Why not just transport the file contents read in as a bytes array, which will retain the formatting of the file, and deliver it via RPC to the backend. Just dump the file contents to the ISO as they are provided. The onus of providing a correctly formatted file is on the end-user.

Saviq commented 3 years ago

Hi @surahman, sorry for the late reply, needed to digest what you wrote :)

All in all, I don't think you should go into that much detail about the contents of the ISO. I mean, it doesn't matter for the implementation. We should just take the contents of the tree as-is and only add the meta, network and vendor files if they were not provided.

I'd look into gRPC streaming mode for the file transfer, as some of those may well be large, so we need to transfer them in chunks.

Why convert any of these files to YAML, etc. after reading?

We did that so that we could inject extra bits necessary for Multipass to function. We shouldn't do that in this case.

surahman commented 3 years ago

@Saviq thanks, I will get started on this as time permits.

surahman commented 3 years ago

@Saviq in the event that there is no provided meta, network, or vendor data you would like to provide defaults - would you like this data hardcoded or as bundled files? If defaults are provided as files it would allow the end-user to customize them once and use many, but if hardcoded this would not be possible. Where should I be placing the code/files for the defaults, is there an appropriate header or location for these?

Saviq commented 3 years ago

@Saviq in the event that there is no provided meta, network, or vendor data you would like to provide defaults

Those exist today, and are generated dynamically for each instance. The same needs to happen when those don't exist in --cloud-init-tree. So if the files are missing from the tree, we need to add them on the fly.