Open nusx opened 6 years ago
@nusx Did you modify the ignition files produced by terraform during the install?
We did not modify any files generated by terraform. We did modify the coreos-install.yaml.tmpl
located under /tectonic/platforms/metal/cl
prior to launching the installer.
---
systemd:
units:
- name: installer.service
enable: true
contents: |
[Unit]
Requires=network-online.target
After=network-online.target
[Service]
Type=simple
ExecStart=/opt/installer
[Install]
WantedBy=multi-user.target
storage:
files:
- path: /opt/installer
filesystem: root
mode: 0500
contents:
inline: |
#!/bin/bash -ex
curl "{{.ignition_endpoint}}?{{.request.raw_query}}&os=installed" -o ignition.json
coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}
udevadm settle
systemctl reboot
passwd:
users:
# intentionally not creating 'core' user so terraform does not SSH during install
- name: debug
create:
groups:
- sudo
- docker
ssh_authorized_keys:
- {{.ssh_authorized_key}}
We modified line 24:
coreos-install -d /dev/sda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}
to
coreos-install -d /dev/vda -C {{.coreos_channel}} -V {{.coreos_version}} -i ignition.json {{if index . "baseurl"}}-b {{.baseurl}}{{end}}
... actually, that's what we did when installing 1.7.3. It's possible we forgot to change this after unpacking the 1.7.5 installer. Need to verify this wasn't the cause for the failure.
@nusx did you verify if this change caused this issue?
@kbrwn The modification was not the source of the issue. It was still needed to install on KVM, because the vanilla template has a hardcoded /dev/sda
for bare-metal installs. The installation was failing, due to configuration files in /var/lib/matchbox
on the provisioning machine, which were not updated/replaced by the tectonic-installer. The resolution was to manually clean out the folders of the matchbox service (except /var/lib/matchbox/assets), prior to launching the tectonic-installer.
@nusx Thanks for sharing your resolution.
This error is fairly indicative: level=error msg="error rendering template: template: :31:22: executing \"\" at <.ign_docker_dropin_j...>: map has no entry for key \"ign_docker_dropin_json\""
.
Tectonic Installer writes Ignition templates and metadata (to fill those templates) to Matchbox. Matchbox complains because a template contains a variable that's not defined anywhere so either at some point, Tectonic Installer was writing mismatched templates/metadata or a customization/edit changed it.
Going between Tectonic installer releases, its possible the variable name was changed and there is not component that knows to cleanup the old template/metadata, unless you had terraform delete those resources from Matchbox beforehand.
Tectonic Version
1.7.5-tectonic.1
Environment
What hardware/cloud provider/hypervisor is being used with Tectonic? KVM
Expected Behavior
Automatic provisioning of coreos to the nodes, installation of k8s and tectonic components resulting in a working k8s cluster with tectonic console and services.
Actual Behavior
After the installer performs terraform apply and the nodes are powered on, the process seems to work as expected (and as experienced with the previous 1.7.3-tectonic version on the same environment). The nodes iPXE boot with coreos, then reboot and fail to continue with the installation. The consoles on each node show a Prompt to press Return to enter a emergency console. The console also shows a count-down leading to an automatic reboot after a couple of minutes. Watching the matchbox service via
jounalctl -f -u matchbox
from the provisioner VM shows an error message at some point after the nodes reboot the first time. The error log reports about failing to render a template due to a missing key:Reproduction Steps
journalctl -f -u matchbox
from the provisioning machine. Whatch the text-console of the cluster nodes being provisioned.