coreos / tectonic-forum

Apache License 2.0
30 stars 9 forks source link

Bare metal installation with tectonic_1.7.5-tectonic.1 installer fails after first reboot #224

Open nusx opened 6 years ago

nusx commented 6 years ago

Tectonic Version

1.7.5-tectonic.1

Environment

What hardware/cloud provider/hypervisor is being used with Tectonic? KVM

Expected Behavior

Automatic provisioning of coreos to the nodes, installation of k8s and tectonic components resulting in a working k8s cluster with tectonic console and services.

Actual Behavior

After the installer performs terraform apply and the nodes are powered on, the process seems to work as expected (and as experienced with the previous 1.7.3-tectonic version on the same environment). The nodes iPXE boot with coreos, then reboot and fail to continue with the installation. The consoles on each node show a Prompt to press Return to enter a emergency console. The console also shows a count-down leading to an automatic reboot after a couple of minutes. Watching the matchbox service via jounalctl -f -u matchbox from the provisioner VM shows an error message at some point after the nodes reboot the first time. The error log reports about failing to render a template due to a missing key:

level=info msg="HTTP GET /boot.ipxe"
level=info msg="HTTP GET /ipxe?uuid=f11818fb-b5ab-4d6b-84cb-5eec641cdbd5&mac=52-54-00-aa-24-70&domain=inhouse.datenautomaten.nu&hostname=&serial="
level=info msg="HTTP GET /assets/coreos/1465.8.0/coreos_production_pxe.vmlinuz"
level=info msg="HTTP GET /assets/coreos/1465.8.0/coreos_production_pxe_image.cpio.gz"
level=info msg="HTTP GET /ignition?uuid=f11818fb-b5ab-4d6b-84cb-5eec641cdbd5&mac=52-54-00-aa-24-70"
level=info msg="HTTP GET /ignition?uuid=f11818fb-b5ab-4d6b-84cb-5eec641cdbd5&mac=52-54-00-aa-24-70&os=installed"
level=error msg="error rendering template: template: :31:22: executing \"\" at <.ign_docker_dropin_j...>: map has no entry for key \"ign_docker_dropin_json\""

Reproduction Steps

  1. Prepare cluster nodes and matchbox server as decribed in section Environment.
  2. Execute the darwin installer from tectonic_1.7.5-tectonic.1.tar.gz, complete the wizard and submit the configuration.
    1. Whatch the matchbox service via journalctl -f -u matchbox from the provisioning machine. Whatch the text-console of the cluster nodes being provisioned.
kbrwn commented 6 years ago

@nusx Did you modify the ignition files produced by terraform during the install?

nusx commented 6 years ago

We did not modify any files generated by terraform. We did modify the coreos-install.yaml.tmpl located under /tectonic/platforms/metal/cl prior to launching the installer.

---
systemd:
  units:
    - name: installer.service
      enable: true
      contents: |
        [Unit]
        Requires=network-online.target
        After=network-online.target
        [Service]
        Type=simple
        ExecStart=/opt/installer
        [Install]
        WantedBy=multi-user.target
storage:
  files:
    - path: /opt/installer
      filesystem: root
      mode: 0500
      contents:
        inline: |
          #!/bin/bash -ex
          curl "{{.ignition_endpoint}}?{{.request.raw_query}}&os=installed" -o ignition.json
          coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}
          udevadm settle
          systemctl reboot
passwd:
  users:
    # intentionally not creating 'core' user so terraform does not SSH during install
    - name: debug
      create:
        groups:
          - sudo
          - docker
      ssh_authorized_keys:
        - {{.ssh_authorized_key}}

We modified line 24:

coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}

to

coreos-install -d /dev/vda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}
nusx commented 6 years ago

... actually, that's what we did when installing 1.7.3. It's possible we forgot to change this after unpacking the 1.7.5 installer. Need to verify this wasn't the cause for the failure.

kbrwn commented 6 years ago

@nusx did you verify if this change caused this issue?

nusx commented 6 years ago

@kbrwn The modification was not the source of the issue. It was still needed to install on KVM, because the vanilla template has a hardcoded /dev/sda for bare-metal installs. The installation was failing, due to configuration files in /var/lib/matchboxon the provisioning machine, which were not updated/replaced by the tectonic-installer. The resolution was to manually clean out the folders of the matchbox service (except /var/lib/matchbox/assets), prior to launching the tectonic-installer.

kbrwn commented 6 years ago

@nusx Thanks for sharing your resolution.

dghubble commented 6 years ago

This error is fairly indicative: level=error msg="error rendering template: template: :31:22: executing \"\" at <.ign_docker_dropin_j...>: map has no entry for key \"ign_docker_dropin_json\"".

Tectonic Installer writes Ignition templates and metadata (to fill those templates) to Matchbox. Matchbox complains because a template contains a variable that's not defined anywhere so either at some point, Tectonic Installer was writing mismatched templates/metadata or a customization/edit changed it.

Going between Tectonic installer releases, its possible the variable name was changed and there is not component that knows to cleanup the old template/metadata, unless you had terraform delete those resources from Matchbox beforehand.