coreos / fedora-coreos-tracker

Issue tracker for Fedora CoreOS
https://fedoraproject.org/coreos/
265 stars 59 forks source link

Ignition: Fails to connect to external server to fetch other config to replace itself #474

Open vboufleur opened 4 years ago

vboufleur commented 4 years ago

Hi all!

I'm working on a bash script that will takeover cloud Ubuntu 16.04 instances and install CoreOS on top of them. I'm Live ISO booting CoreOS on a VPS (OVH) with a (1) base ignition config embedded with a bash script that will call coreos-installer and pass to it another (2) ignition config that will source a (3) external ignition config.

But this second config file is failing to fetch the third, networked, external one. First it was set to source the file with a direct IP:

variant: fcos
version: 1.0.0
ignition:
  config:
    replace:
      source: http://54.39.179.16/ignition.network.json

The link works: http://54.39.179.16/ignition.network.json

But this failed: Screen Shot 2020-05-05 at 15 50 02

Then I tried with DNS:

variant: fcos
version: 1.0.0
ignition:
  config:
    replace:
      source: http://devops.ipbdev.com/ignition.network.json

It fails too: Screen Shot 2020-05-05 at 16 36 47

Here's the source file.

Based Live ISO embedded config (1) and (2):

variant: fcos
version: 1.0.0
systemd:
  units:
  - name: run-coreos-installer.service
    enabled: true
    contents: |
      [Unit]
      After=network-online.target
      Wants=network-online.target
      Before=systemd-user-sessions.service
      OnFailure=emergency.target
      OnFailureJobMode=replace-irreversibly
      [Service]
      RemainAfterExit=yes
      Type=oneshot
      ExecStart=/usr/local/bin/run-coreos-installer
      ExecStartPost=/usr/bin/systemctl --no-block reboot
      StandardOutput=kmsg+console
      StandardError=kmsg+console
      [Install]
      WantedBy=multi-user.target
storage:
  files:
    - path: /home/core/config.ign
      # A basic Ignition config that will replace itself with our network ignition file
      contents:
        inline: |
          {
            "ignition": {
              "config": {
                "replace": {
                  "source": "http://devops.ipbdev.com/ignition.network.json",
                  "verification": {}
                }
              },
              "security": {
                "tls": {}
              },
              "timeouts": {},
              "version": "3.0.0"
            },
            "passwd": {},
            "storage": {},
            "systemd": {}
          }
    - path: /usr/local/bin/run-coreos-installer
      mode: 0755
      contents:
        inline: |
          #!/usr/bin/bash
          set -x
          main() {
                      # Some custom arguments for firstboot
            firstboot_args="console=tty0"

                      ignition_file="/home/core/config.ign"

                        # TODO: Change using stream 'stable' for a defined image, that we host, like below.
                        # image_url="https://54.39.179.16/modified.iso"

            # Dynamically detect which device to install to.
            # This represents something an admin may want to do to share the
            # same installer automation across various hardware.
                        # TODO: For the takeover script this value would need be dynamically set to where / is mounted on the system
            if [ -b /dev/sda ]; then
              install_device='/dev/sda'
            elif [ -b /dev/nvme0 ]; then
              install_device='/dev/nvme0'
            else
              echo "Can't find appropriate device to install to" 1>&2
              echo 'failure'
              return 1
            fi

            # Call out to the installer
            cmd="coreos-installer install --firstboot-args=${firstboot_args}"
            cmd+=" --stream=stable --ignition=${ignition_file}"
            cmd+=" ${install_device}"
            if $cmd; then
              echo "Install Succeeded!"
              echo 'success'
              return 0
            else
              echo "Install Failed!"
              echo 'failure'
              return 1
            fi
          }
          main

Any help would be dearly appreciated.

Shoutout to @dustymabe who made this wonderful article that inspired me to make the script above: https://dustymabe.com/2020/04/04/automating-a-custom-install-of-fedora-coreos/

vboufleur commented 4 years ago

I'm serving the files over the Web with Nginx, default settings.

jlebon commented 4 years ago

Hmm, does the output show whether NetworkManager tries to bring up networking? What version of the FCOS live ISO are you using? Might be a regression from https://github.com/coreos/fedora-coreos-config/pull/326. One sanity-check is (if you have access to the kernel cmdline) to add rd.neednet=1 and see if it works.

vboufleur commented 4 years ago

Adding rd.neednet=1 to the kernel command line solved it for me. Thanks!

A doc page detailing all possible kernel command line options for Fedora CoreOS would be great. It would help other people that stumble on this issue.

jlebon commented 4 years ago

Re-opening. We need to double check that one doesn't have to add rd.neednet=1 if an Ignition config is embedded.

dustymabe commented 4 years ago

hey @vboufleur - what version of the LiveISO are you using? A filename should suffice.

vboufleur commented 4 years ago

@dustymabe this is the ISO version: fedora-coreos-31.20200407.3.0-live.x86_64.iso

ingobecker commented 4 years ago

I'm having a similar problem. I have embedded an ignition into an image that looks like this:

variant: fcos
version: 1.0.0
ignition:
  config:
    replace:
      source: http://169.254.169.254/hetzner/v1/user-data

In my case rd.neednet=1 is present in the kernel cmdline. I'm not sure if it is possible to use an ipv4ll address here, but using this pattern would make it possible to use hetzners user_data endpoint without modifying the ignition code. The errors are similar to those of @vboufleur

ingobecker commented 4 years ago

Ok, i debugged my problem. It was just a typo in the source url (should end with userdata instead of user-data). Sorry for that.

dustymabe commented 4 years ago

Re-opening. We need to double check that one doesn't have to add rd.neednet=1 if an Ignition config is embedded.

OK I looked at this a bit today (sorry for the delay). From what I understand the problem isn't actually the install boot that needs the network, but rather the subsequent first boot (ignition boot) of the installed system. I think the tricky part here is that passing any --firstboot-args to coreos-installer will overwrite the default networking kargs (defaulting to ip=dhcp,dhcp6 rd.neednet=1). We need to decide if this is a bug or not, though I will note the problem will probably go away once we implement https://github.com/coreos/fedora-coreos-tracker/issues/460 .

@vboufleur a workaround for now is to add ip=dhcp,dhcp6 rd.neednet=1 to your firstboot kargs so they'll get added. Be careful doing that in the script in the fcct from my blog post, though, as the quoting gets tricky in bash. I probably should have used more than one arg in that example.