Increase size of Ignition embed area in Live ISO

travier commented 11 months ago

Describe the bug

The current size is about 256KB (https://github.com/coreos/coreos-assembler/blob/main/src/cmd-buildextend-live#L113C16-L113C16) and some uses cases require more (see https://issues.redhat.com/browse/OCPBUGS-20177).

Should we increase the size of this embed Ignition file or should we suggest they use something else?

Reproduction steps

Embed a "bigger"" file (1MB) in the LiveISO.

Expected behavior

It works for 1MB Ignition configs.

Actual behavior

It fails for 1MB Ignition configs.

System details

LiveISO

Butane or Ignition config

No response

Additional information

No response

jlebon commented 11 months ago

I think we may have discussed this in the past and we landed on not increasing it because any reasonable limit we choose will not be enough for some users. Users are encouraged to chain with other external Ignition configs and resources. I guess in this case they're hitting a network bootstrapping issue?

zaneb commented 11 months ago

In this case (agent-based installer) there's nowhere external to chain to. It's not so much a network bootstrapping issue as these manifests are only used to set up the SDN overlay once the cluster is up. It's that we have to get all of the configuration files for the cluster provided by the user into the live ISO that can then be carried into a disconnected environment.

travier commented 11 months ago

Maybe we should add a mechanism to embed arbitrarily sized files at the end of the LiveISO so that we do not rely on a fixed sized blob.

andfasano commented 11 months ago

Maybe we should add a mechanism to embed arbitrarily sized files at the end of the LiveISO so that we do not rely on a fixed sized blob.

That will be helpful in general, and it will allow the user to apply his/her own customization

jlebon commented 11 months ago

It's that we have to get all of the configuration files for the cluster provided by the user into the live ISO that can then be carried into a disconnected environment.

Can you clarify what these config files are? Looking at the linked Jira ticket, I only found installer manifests and didn't quite follow why those need to be part of the live ISO. In this disconnected environment, is e.g. one node selected as "the installer node" that the other nodes connect to via the agent?

andfasano commented 11 months ago

Can you clarify what these config files are? Looking at the linked Jira ticket, I only found installer manifests and didn't quite follow why those need to be part of the live ISO.

The agent-based installer supports the OpenShift installer cluster customization, so that the user can specify a number of additional manifests that will be included during the initial installation (day1). Since the agent-based installer produces an ISO, those extra manifests need to be included in it. Sean may provide more details about the specific case, but in general this approach could be used for day1 customizations.

In this disconnected environment, is e.g. one node selected as "the installer node" that the other nodes connect to via the agent?

Yes. In both connected/disconnected environments, the rendezvous node is the ephemeral orchestrator node that will manage the cluster installation.

andfasano commented 11 months ago

cc @seanmerrow

jlebon commented 11 months ago

Hmm OK, so we're trying to fit possibly numerous cluster object definitions into the live ISO. ISTM like bumping the pre-allocated space to 1M would be more of a stopgap solution, would you agree?

Would it make sense to consume the manifests as a container image instead? Then the installer (or more likely, the code that orchestrates it) could pull it down and unpack it. Even in a disconnected install, the nodes must have access to an image registry containing the mirrored release payload images and the user's own workload images, right?

andfasano commented 11 months ago

I don't think that consuming the manifests from a container approach will work, at least for the agent-based installer point of view. In such workflow, the user prepares a (single) live ISO by running the openshift-install agent create image command, and the ISO will contain all the necessary elements (in particular, a set of specialized services) to orchestrate the installation when booted (including the extra manifests eventually specified by the user). Note that the ISO could be prepared into an environment completely different from the one where it will be applied. The suggested mechanism from @travier to embed arbirtraly sized files at the end of the LiveISO looks to me a better fit for this use case.

lpbinh commented 11 months ago

IMHO, what @travier suggested is the best solution among what we have discussed. Bumping up the pre-allocated size would not be a long term solution - taking the example that we were initially discussed about increasing from 256K to 1M because Calico manifests were 512K, however that 1M wouldn't work because Juniper CN2 CNI is already over 1.25M when we tried it with agent-based installer. ------quoted---- DEBUG trying iso9660 with physical block size 0 ERROR failed to write asset (Agent Installer ISO) to disk: cannot generate ISO image due to configuration errors FATAL failed to fetch Agent Installer ISO: failed to generate asset "Agent Installer ISO": failed to create overwrite reader for ignition: content length (1312018) exceeds embed area size (262144) [root@b1s7-node3 agent-based-installer]#

dustymabe commented 10 months ago

We discussed this during the community meeting today:

12:52:22     dustymabe | #agreed We consider our ISO to already be a
                       | fragile piece of our architecture and would
                       | prefer to limit changes to it. We will try
                       | to meet with the Assisted Installer
                       | (OpenShift) team to understand the use
                       | cases more to see if there are alternative 
                       | solutions to this problem.

@travier has agreed to organize this meeting.

cgwalters commented 10 months ago

Ultimately I think the flow for nontrivial ISO things should be the same as layering: build a bootable container image, and then pass it to a tool like osbuild which makes a custom ISO from it.

zaneb commented 9 months ago

Another limitation of the agent-based installer is that it is part of the OpenShift installer - a single statically-linked binary with ideally no dependencies, that runs on any flavour of Linux and also MacOS. Vendoring in something like skopeo would be painful. Depending on external Python tools like osbuild is a non-starter.

lpbinh commented 7 months ago

Do we have an update on if a solution was proposed/decided ? Thank you

jlebon commented 4 months ago

In such workflow, the user prepares a (single) live ISO by running the openshift-install agent create image command, and the ISO will contain all the necessary elements (in particular, a set of specialized services) to orchestrate the installation when booted (including the extra manifests eventually specified by the user). Note that the ISO could be prepared into an environment completely different from the one where it will be applied.

One low-tech solution here is to have openshift-install agent create image take a --remote-ignition 'http://...' switch which tells the code to embed in the ISO an Ignition that fetches from the given URL. It then also spits out the Ignition config that the user must host at that URL.

The installer could detect the condition when the Ignition config is too large and give an error message that suggests using --remote-ignition.

jlebon commented 4 months ago

This is analogous to coreos-installer iso extract minimal-iso which takes a --rootfs-url URL and an --output-rootfs PATH; it takes out the rootfs from the ISO and writes it to PATH and adds a coreos.live.rootfs_url karg to the minimal ISO pointing at URL. It's the user's responsibility to have the given rootfs hosted at that URL.

coreos / fedora-coreos-tracker