Azure / AKS

Azure Kubernetes Service
https://azure.github.io/AKS/
1.97k stars 308 forks source link

Node VMs (1.9.6) "NotReady" after restart: update-grub corrupts /boot/grub/grub.cfg #412

Closed svollath closed 5 years ago

svollath commented 6 years ago

When deploying a 3 node (kubernetes 1.9.6) cluster with "az aks create ..." we need to modify and reboot every kubernetes node for enabling "swapaccount=1". When doing that every single node will enter "NotReady" state - my suspision is that VMs got stuck at GRUB bootloader not booting.

I'm able to work around that, with just modifying /boot/grub/grub.cfg and restarting the VMs. So it's likely there's an issue within "update-grub".

Here's what we do to modify and restart each VM (worked ~4 weeks before):

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo sed -i 's/GRUB_CMDLINE_LINUX_DEFAULT=\"console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300\"/GRUB_CMDLINE_LINUX_DEFAULT=\"console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1\"/g' /etc/default/grub.d/50-cloudimg-settings.cfg"

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo update-grub"

> az vm restart -g <RGNAME> -n <VMNAME>

Here's the resulting /boot/grub/grub.cfg:

opio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1018-azure ...'
                linux   /boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1018-azure (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1018-azure-recovery-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1018-azure ...'
                linux   /boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro recovery nomodeset
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1016-azure' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1016-azure-advanced-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                gfxmode $linux_gfx_mode
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1016-azure ...'
                linux   /boot/vmlinuz-4.13.0-1016-azure root=UUID=dfb884db-c6ad-4c7a-8075-98cf640a2892 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.13.0-1016-azure
        }
        menuentry 'Ubuntu, with Linux 4.13.0-1016-azure (recovery mode)' --class ubuntu --class gnu-linux --class gnu --class os $menuentry_id_option 'gnulinux-4.13.0-1016-azure-recovery-dfb884db-c6ad-4c7a-8075-98cf640a2892' {
                recordfail
                load_video
                insmod gzio
                if [ x$grub_platform = xxen ]; then insmod xzio; insmod lzopio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  dfb884db-c6ad-4c7a-8075-98cf640a2892
                else
                  search --no-floppy --fs-uuid --set=root dfb884db-c6ad-4c7a-8075-98cf640a2892
                fi
                echo    'Loading Linux 4.13.0-1016-azure ...'
                linux   /boot/vmlinuz-4.13.0-1016-azure root=UUID=dfb884db-c6ad-4c7a-8075-98cf640a2892 ro recovery nomodeset
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.13.0-1016-azure
        }
}

### END /etc/grub.d/10_linux ###

### BEGIN /etc/grub.d/20_linux_xen ###
### END /etc/grub.d/20_linux_xen ###

### BEGIN /etc/grub.d/30_os-prober ###
### END /etc/grub.d/30_os-prober ###

### BEGIN /etc/grub.d/30_uefi-firmware ###
### END /etc/grub.d/30_uefi-firmware ###

### BEGIN /etc/grub.d/40_custom ###
# This file provides an easy way to add custom menu entries.  Simply type the
# menu entries you want to add after this comment.  Be careful not to change
# the 'exec tail' line above.
### END /etc/grub.d/40_custom ###

### BEGIN /etc/grub.d/41_custom ###
if [ -f  ${config_directory}/custom.cfg ]; then
  source ${config_directory}/custom.cfg
elif [ -z \"${config_directory}\" -a -f  $prefix/custom.cfg ]; then
  source $prefix/custom.cfg;
fi
### END /etc/grub.d/41_custom ###
tomconte commented 6 years ago

Below is the boot log from serial console after having modified the 50-cloudimg-settings.cfg file and running update-grub as described by Sebastian.

In the end it seems the eth0 network interface is not found, which means the VM is unreachable, services don't start, etc.

[FAILED] Failed to start Raise network interfaces.

Full log:

[    0.000000] random: get_random_bytes called from start_kernel+0x42/0x507 with crng_init
=0
[    0.000000] Linux version 4.13.0-1018-azure (buildd@lcy01-amd64-014) (gcc version 5.4.0
 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.9)) #21-Ubuntu SMP Thu May 17 13:58:38 UTC 2018 (Ub
untu 4.13.0-1018.21-azure 4.13.16)
[    0.000000] Command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro
console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
[    0.000000] KERNEL supported cpus:
[    0.000000]   Intel GenuineIntel
[    0.000000]   AMD AuthenticAMD
[    0.000000]   Centaur CentaurHauls
[    0.000000] x86/fpu: Supporting XSAVE feature 0x001: 'x87 floating point registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x002: 'SSE registers'
[    0.000000] x86/fpu: Supporting XSAVE feature 0x004: 'AVX registers'
[    0.000000] x86/fpu: xstate_offset[2]:  576, xstate_sizes[2]:  256
[    0.000000] x86/fpu: Enabled xstate features 0x7, context size is 832 bytes, using 'sta
ndard' format.
[    0.000000] e820: BIOS-provided physical RAM map:
[    0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009fbff] usable
[    0.000000] BIOS-e820: [mem 0x000000000009fc00-0x000000000009ffff] reserved
[    0.000000] BIOS-e820: [mem 0x00000000000e0000-0x00000000000fffff] reserved
[    0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000001ffeffff] usable
[    0.000000] BIOS-e820: [mem 0x000000001fff0000-0x000000001fffefff] ACPI data
[    0.000000] BIOS-e820: [mem 0x000000001ffff000-0x000000001fffffff] ACPI NVS
[    0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000029fffffff] usable
[    0.000000] bootconsole [earlyser0] enabled
[    0.000000] NX (Execute Disable) protection: active
[    0.000000] random: fast init done
[    0.000000] SMBIOS 2.3 present.
[    0.000000] DMI: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06
/02/2017
[    0.000000] Hypervisor detected: Microsoft Hyper-V
[    0.000000] Hyper-V: features 0x2e7f, hints 0xc2c
[    0.000000] Hyper-V Host Build:14393-10.0-0-0.230
[    0.000000] Hyper-V: LAPIC Timer Frequency: 0xc3500
[    0.000000] tsc: Marking TSC unstable due to running on Hyper-V
[    0.000000] Hyper-V: Using ext hypercall for remote TLB flush
[    0.000000] e820: last_pfn = 0x2a0000 max_arch_pfn = 0x400000000
[    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
Memory KASLR using RDRAND RDTSC...
[    0.000000] e820: last_pfn = 0x1fff0 max_arch_pfn = 0x400000000
[    0.000000] found SMP MP-table at [mem 0x000ff780-0x000ff78f] mapped at [ffff94ca000ff7
80]
[    0.000000] Scanning 1 areas for low memory corruption
[    0.000000] Using GB pages for direct mapping
[    0.000000] ACPI: Early table checksum verification disabled
[    0.000000] ACPI: RSDP 0x00000000000F5BF0 000014 (v00 ACPIAM)
[    0.000000] ACPI: RSDT 0x000000001FFF0000 000040 (v01 VRTUAL MICROSFT 06001702 MSFT 000
00097)
[    0.000000] ACPI: FACP 0x000000001FFF0200 000081 (v02 VRTUAL MICROSFT 06001702 MSFT 000
00097)
[    0.000000] ACPI: DSDT 0x000000001FFF1D24 003CBE (v01 MSFTVM MSFTVM02 00000002 INTL 020
02026)
[    0.000000] ACPI: FACS 0x000000001FFFF000 000040
[    0.000000] ACPI: WAET 0x000000001FFF1A80 000028 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: SLIC 0x000000001FFF1AC0 000176 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: OEM0 0x000000001FFF1CC0 000064 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: SRAT 0x000000001FFF0800 000140 (v02 VRTUAL MICROSFT 00000001 MSFT 00000001)
[    0.000000] ACPI: APIC 0x000000001FFF0300 000452 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] ACPI: OEMB 0x000000001FFFF040 000064 (v01 VRTUAL MICROSFT 06001702 MSFT 00000097)
[    0.000000] SRAT: PXM 0 -> APIC 0x00 -> Node 0
[    0.000000] SRAT: PXM 0 -> APIC 0x01 -> Node 0
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x1fffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x29fffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x2a0200000-0xfdfffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x1000000000-0xffffffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x10000200000-0x1ffffffffff] hotplug
[    0.000000] ACPI: SRAT: Node 0 PXM 0 [mem 0x20000200000-0x3ffffffffff] hotplug
[    0.000000] NUMA: Node 0 [mem 0x00000000-0x1fffffff] + [mem 0x100000000-0x29fffffff] -> [mem 0x00000000-0x29fffffff]
[    0.000000] NODE_DATA(0) allocated [mem 0x29ffd5000-0x29fffffff]
[    0.000000] Zone ranges:
[    0.000000]   DMA      [mem 0x0000000000001000-0x0000000000ffffff]
[    0.000000]   DMA32    [mem 0x0000000001000000-0x00000000ffffffff]
[    0.000000]   Normal   [mem 0x0000000100000000-0x000000029fffffff]
[    0.000000]   Device   empty
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000000001000-0x000000000009efff]
[    0.000000]   node   0: [mem 0x0000000000100000-0x000000001ffeffff]
[    0.000000]   node   0: [mem 0x0000000100000000-0x000000029fffffff]
[    0.000000] Initmem setup node 0 [mem 0x0000000000001000-0x000000029fffffff]
[    0.000000] ACPI: PM-Timer IO Port: 0x408
[    0.000000] ACPI: LAPIC_NMI (acpi_id[0xff] dfl dfl lint[0x1])
[    0.000000] IOAPIC[0]: apic_id 0, version 17, address 0xfec00000, GSI 0-23
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 0 global_irq 2 dfl dfl)
[    0.000000] ACPI: INT_SRC_OVR (bus 0 bus_irq 9 global_irq 9 high level)
[    0.000000] Using ACPI (MADT) for SMP configuration information
[    0.000000] smpboot: Allowing 128 CPUs, 126 hotplug CPUs
[    0.000000] PM: Registered nosave memory: [mem 0x00000000-0x00000fff]
[    0.000000] PM: Registered nosave memory: [mem 0x0009f000-0x0009ffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000a0000-0x000dffff]
[    0.000000] PM: Registered nosave memory: [mem 0x000e0000-0x000fffff]
[    0.000000] PM: Registered nosave memory: [mem 0x1fff0000-0x1fffefff]
[    0.000000] PM: Registered nosave memory: [mem 0x1ffff000-0x1fffffff]
[    0.000000] PM: Registered nosave memory: [mem 0x20000000-0xffffffff]
[    0.000000] e820: [mem 0x20000000-0xffffffff] available for PCI devices
[    0.000000] Booting paravirtualized kernel on bare hardware
[    0.000000] clocksource: refined-jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645519600211568 ns
[    0.000000] setup_percpu: NR_CPUS:8192 nr_cpumask_bits:128 nr_cpu_ids:128 nr_node_ids:1
[    0.000000] percpu: Embedded 44 pages/cpu @ffff94cc96600000 s141848 r8192 d30184 u262144
[    0.000000] Built 1 zonelists in Node order, mobility grouping on.  Total pages: 1806201
[    0.000000] Policy zone: Normal
[    0.000000] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-4.13.0-1018-azure root=/dev/sda1 ro console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
[    0.000000] log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] log_buf_len total cpu_extra contributions: 520192 bytes
[    0.000000] log_buf_len min size: 262144 bytes
[    0.000000] log_buf_len: 1048576 bytes
[    0.000000] early log buf free: 252048(96%)
[    0.000000] PID hash table entries: 4096 (order: 3, 32768 bytes)
[    0.000000] Memory: 7109508K/7339576K available (12300K kernel code, 2317K rwdata, 3588K rodata, 2204K init, 2340K bss, 230068K reserved, 0K cma-reserved)
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=128, Nodes=1
[    0.000000] Kernel/User page tables isolation: enabled
[    0.000000] ftrace: allocating 34707 entries in 136 pages
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU restricting CPUs from NR_CPUS=8192 to nr_cpu_ids=128.
[    0.000000]  Tasks RCU enabled.
[    0.000000] RCU: Adjusting geometry for rcu_fanout_leaf=16, nr_cpu_ids=128
[    0.000000] NR_IRQS: 524544, nr_irqs: 1448, preallocated irqs: 16
[    0.000000] Console: colour VGA+ 80x25
[    0.000000] console [tty1] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] console [ttyS0] enabled
[    0.000000] bootconsole [earlyser0] disabled
[    0.000000] bootconsole [earlyser0] disabled
[    0.000000] tsc: Detected 2294.684 MHz processor
[    0.004000] Calibrating delay loop (skipped), value calculated using timer frequency..4589.36 BogoMIPS (lpj=9178736)
[    0.008027] pid_max: default: 131072 minimum: 1024
[    0.012073] ACPI: Core revision 20170531
[    0.021454] ACPI: 1 ACPI AML tables successfully acquired and loaded
[    0.024178] Security Framework initialized
[    0.028028] Yama: becoming mindful.
[    0.032194] AppArmor: AppArmor initialized
[    0.037732] Dentry cache hash table entries: 1048576 (order: 11, 8388608 bytes)
[    0.040828] Inode-cache hash table entries: 524288 (order: 10, 4194304 bytes)
[    0.044100] Mount-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.048058] Mountpoint-cache hash table entries: 16384 (order: 5, 131072 bytes)
[    0.052494] CPU: Physical Processor ID: 0
[    0.060025] CPU: Processor Core ID: 0
[    0.064098] FEATURE SPEC_CTRL Not Present
[    0.068037] mce: CPU supports 1 MCE banks
[    0.072129] Last level iTLB entries: 4KB 64, 2MB 8, 4MB 8
[    0.076032] Last level dTLB entries: 4KB 64, 2MB 0, 4MB 0, 1GB 4
[    0.080025] Spectre V2 mitigation: Mitigation: Full generic retpoline
[    0.084025] Spectre V2 mitigation: Speculation control IBPB not-supported IBRS not-supported
[    0.084026] Speculative Store Bypass: Vulnerable
[    0.092261] Freeing SMP alternatives memory: 32K
[    0.100981] smpboot: Max logical packages: 64
[    0.104127] Switched APIC routing to physical flat.
[    0.108053] clocksource: hyperv_clocksource_tsc_page: mask: 0xffffffffffffffff max_cycles: 0x24e6a1710, max_idle_ns: 440795202120 ns
[    0.143285] ..TIMER: vector=0x30 apic1=0 pin1=2 apic2=-1 pin2=-1
[    0.144153] smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2673 v4 @ 2.30GHz (family: 0x6, model: 0x4f, stepping: 0x1)
[    0.148187] Performance Events: unsupported p6 CPU model 79 no PMU driver, software events only.
[    0.152079] Hierarchical SRCU implementation.
[    0.161528] NMI watchdog: disabled (cpu0): hardware events not enabled
[    0.168006] NMI watchdog: Shutting down hard lockup detector on all cpus
[    0.172018] smp: Bringing up secondary CPUs ...
[    0.176138] x86: Booting SMP configuration:
[    0.180003] .... node  #0, CPUs:          #1
[    0.180591] smp: Brought up 1 node, 2 CPUs
[    0.188004] smpboot: Total of 2 processors activated (9178.73 BogoMIPS)
[    0.198691] devtmpfs: initialized
[    0.200060] x86/mm: Memory block size: 128MB
[    0.208469] evm: security.selinux
[    0.212002] evm: security.SMACK64
[    0.216001] evm: security.SMACK64EXEC
[    0.220006] evm: security.SMACK64TRANSMUTE
[    0.224002] evm: security.SMACK64MMAP
[    0.228004] evm: security.ima
[    0.232002] evm: security.capability
[    0.236137] PM: Registering ACPI NVS region [mem 0x1ffff000-0x1fffffff] (4096 bytes)
[    0.240137] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 7645041785100000 ns
[    0.244095] futex hash table entries: 32768 (order: 9, 2097152 bytes)
[    0.248741] pinctrl core: initialized pinctrl subsystem
[    0.273897] RTC time: 14:43:21, date: 06/06/18
[    0.276244] NET: Registered protocol family 16
[    0.280230] cpuidle: using governor ladder
[    0.284015] cpuidle: using governor menu
[    0.288006] PCCT header not found.
[    0.292097] ACPI: bus type PCI registered
[    0.296002] acpiphp: ACPI Hot Plug PCI Controller Driver version: 0.5
[    0.300635] PCI: Using configuration type 1 for base access
[    0.305115] HugeTLB registered 1.00 GiB page size, pre-allocated 0 pages
[    0.308004] HugeTLB registered 2.00 MiB page size, pre-allocated 0 pages
[    0.312217] ACPI: Added _OSI(Module Device)
[    0.316007] ACPI: Added _OSI(Processor Device)
[    0.320003] ACPI: Added _OSI(3.0 _SCP Extensions)
[    0.324008] ACPI: Added _OSI(Processor Aggregator Device)
[    0.335168] ACPI: Interpreter enabled
[    0.336018] ACPI: (supports S0 S5)
[    0.340002] ACPI: Using IOAPIC for interrupt routing
[    0.344028] PCI: Using host bridge windows from ACPI; if necessary, use "pci=nocrs" and report a bug
[    0.348194] ACPI: Enabled 1 GPEs in block 00 to 0F
[    0.374344] ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-ff])
[    0.376010] acpi PNP0A03:00: _OSC: OS supports [ASPM ClockPM Segments MSI]
[    0.380009] acpi PNP0A03:00: _OSC failed (AE_NOT_FOUND); disabling ASPM
[    0.384012] acpi PNP0A03:00: fail to add MMCONFIG information, can't access extended PCI configuration space under this bridge.
[    0.392154] PCI host bridge to bus 0000:00
[    0.396005] pci_bus 0000:00: root bus resource [mem 0xfe0000000-0xfffffffff window]
[    0.400007] pci_bus 0000:00: root bus resource [io  0x0000-0x0cf7 window]
[    0.404006] pci_bus 0000:00: root bus resource [io  0x0d00-0xffff window]
[    0.408008] pci_bus 0000:00: root bus resource [mem 0x000a0000-0x000bffff window]
[    0.412007] pci_bus 0000:00: root bus resource [mem 0x20000000-0xfffbffff window]
[    0.416010] pci_bus 0000:00: root bus resource [bus 00-ff]
[    0.430060] pci 0000:00:07.1: legacy IDE quirk: reg 0x10: [io  0x01f0-0x01f7]
[    0.432006] pci 0000:00:07.1: legacy IDE quirk: reg 0x14: [io  0x03f6]
[    0.436058] pci 0000:00:07.1: legacy IDE quirk: reg 0x18: [io  0x0170-0x0177]
[    0.440008] pci 0000:00:07.1: legacy IDE quirk: reg 0x1c: [io  0x0376]
[    0.444568] * Found PM-Timer Bug on the chipset. Due to workarounds for a bug,
[    0.444568] * this clock source is slow. Consider trying other clock sources
[    0.453638] pci 0000:00:07.3: quirk: [io  0x0400-0x043f] claimed by PIIX4 ACPI
[    0.477626] ACPI: PCI Interrupt Link [LNKA] (IRQs 3 4 5 7 9 10 *11 12 14 15)
[    0.480250] ACPI: PCI Interrupt Link [LNKB] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.484272] ACPI: PCI Interrupt Link [LNKC] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.488329] ACPI: PCI Interrupt Link [LNKD] (IRQs 3 4 5 7 9 10 11 12 14 15) *0, disabled.
[    0.492502] SCSI subsystem initialized
[    0.496099] pci 0000:00:08.0: vgaarb: setting as boot VGA device
[    0.500000] pci 0000:00:08.0: vgaarb: VGA device added: decodes=io+mem,owns=io+mem,locks=none
[    0.500007] pci 0000:00:08.0: vgaarb: bridge control possible
[    0.504001] vgaarb: loaded
[    0.508114] EDAC MC: Ver: 3.0.0
[    0.512544] hv_vmbus: Vmbus version:4.0
[    0.516598] PCI: Using ACPI for IRQ routing
[    0.520769] NetLabel: Initializing
[    0.524004] NetLabel:  domain hash size = 128
[    0.528009] NetLabel:  protocols = UNLABELED CIPSOv4 CALIPSO
[    0.532025] NetLabel:  unlabeled traffic allowed by default
[    0.536115] clocksource: Switched to clocksource hyperv_clocksource_tsc_page
[    0.588480] VFS: Disk quotas dquot_6.6.0
[    0.611378] VFS: Dquot-cache hash table entries: 512 (order 0, 4096 bytes)
[    0.648303] AppArmor: AppArmor Filesystem Enabled
[    0.675846] pnp: PnP ACPI init
[    0.697297] system 00:06: [io  0x01e0-0x01ef] has been reserved
[    0.731394] system 00:06: [io  0x0160-0x016f] has been reserved
[    0.766462] system 00:06: [io  0x0278-0x027f] has been reserved
[    0.800494] system 00:06: [io  0x0378-0x037f] has been reserved
[    0.835379] system 00:06: [io  0x0678-0x067f] has been reserved
[    0.869626] system 00:06: [io  0x0778-0x077f] has been reserved
[    0.899377] system 00:06: [io  0x04d0-0x04d1] has been reserved
[    0.933407] system 00:07: [io  0x0400-0x043f] has been reserved
[    0.967892] system 00:07: [io  0x0370-0x0371] has been reserved
[    1.002574] system 00:07: [io  0x0440-0x044f] has been reserved
[    1.037272] system 00:07: [mem 0xfec00000-0xfec00fff] could not be reserved
[    1.077732] system 00:07: [mem 0xfee00000-0xfee00fff] has been reserved
[    1.118139] system 00:08: [mem 0x00000000-0x0009ffff] could not be reserved
[    1.159370] system 00:08: [mem 0x000c0000-0x000dffff] could not be reserved
[    1.199200] system 00:08: [mem 0x000e0000-0x000fffff] could not be reserved
[    1.239406] system 00:08: [mem 0x00100000-0x1fffffff] could not be reserved
[    1.281371] system 00:08: [mem 0xfffc0000-0xffffffff] has been reserved
[    1.319632] pnp: PnP ACPI: found 9 devices
[    1.349720] clocksource: acpi_pm: mask: 0xffffff max_cycles: 0xffffff, max_idle_ns: 2085701024 ns
[    1.401047] NET: Registered protocol family 2
[    1.427030] TCP established hash table entries: 65536 (order: 7, 524288 bytes)
[    1.468735] TCP bind hash table entries: 65536 (order: 8, 1048576 bytes)
[    1.508119] TCP: Hash tables configured (established 65536 bind 65536)
[    1.546623] UDP hash table entries: 4096 (order: 5, 131072 bytes)
[    1.581707] UDP-Lite hash table entries: 4096 (order: 5, 131072 bytes)
[    1.619218] NET: Registered protocol family 1
[    1.644064] pci 0000:00:00.0: Limiting direct PCI/PCI transfers
[    1.676813] pci 0000:00:08.0: Video device with shadowed ROM at [mem 0x000c0000-0x000dffff]
[    1.721288] PCI-DMA: Using software bounce buffering for IO (SWIOTLB)
[    1.755352] software IO TLB [mem 0x1bff0000-0x1fff0000] (64MB) mapped at [ffff94ca1bff0000-ffff94ca1ffeffff]
[    1.808312] clocksource: tsc: mask: 0xffffffffffffffff max_cycles: 0x21139776891, max_idle_ns: 440795250948 ns
[    1.816033] Scanning for low memory corruption every 60 seconds
[    1.818737] audit: initializing netlink subsys (disabled)
[    1.819189] Initialise system trusted keyrings
[    1.819197] Key type blacklist registered
[    1.819212] audit: type=2000 audit(1528296200.818:1): state=initialized audit_enabled=0 res=1
[    1.819224] workingset: timestamp_bits=36 max_order=21 bucket_order=0
[    1.821634] zbud: loaded
[    1.823113] squashfs: version 4.0 (2009/01/31) Phillip Lougher
[    1.823534] fuse init (API version 7.26)
[    1.826952] Key type asymmetric registered
[    1.826953] Asymmetric key parser 'x509' registered
[    1.827110] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 247)
[    1.827189] io scheduler noop registered (default)
[    2.253191] input: Power Button as /devices/LNXSYSTM:00/LNXPWRBN:00/input/input0
[    2.294826] ACPI: Power Button [PWRF]
[    2.316588] Serial: 8250/16550 driver, 32 ports, IRQ sharing enabled
[    2.445156] 00:03: ttyS0 at I/O 0x3f8 (irq = 4, base_baud = 115200) is a 16550A
[    2.579185] 00:04: ttyS1 at I/O 0x2f8 (irq = 3, base_baud = 115200) is a 16550A
[    2.623177] Linux agpgart interface v0.103
[    2.649070] loop: module loaded
[    2.667836] hv_vmbus: registering driver hv_storvsc
[    2.696202] scsi host0: storvsc_host_t
[    2.721798] scsi 0:0:0:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI:5
[    2.771507] scsi host1: storvsc_host_t
[    2.798824] scsi 1:0:1:0: Direct-Access     Msft     Virtual Disk     1.0  PQ: 0 ANSI:5
[    2.848324] scsi host2: storvsc_host_t
[    2.877604] scsi host3: storvsc_host_t
[    2.905681] sd 0:0:0:0: Attached scsi generic sg0 type 0
[    2.913625] sd 0:0:0:0: [sda] 62914560 512-byte logical blocks: (32.2 GB/30.0 GiB)
[    2.913627] sd 0:0:0:0: [sda] 4096-byte physical blocks
[    2.913666] sd 0:0:0:0: [sda] Write Protect is off
[    2.913832] sd 0:0:0:0: [sda] Write cache: enabled, read cache: enabled, supports DPO and FUA
[    2.915351]  sda: sda1
[    2.916614] sd 0:0:0:0: [sda] Attached SCSI disk
[    3.124672] sd 1:0:1:0: Attached scsi generic sg1 type 0
[    3.154492] sd 1:0:1:0: [sdb] 29360128 512-byte logical blocks: (15.0 GB/14.0 GiB)
[    3.154794] ata_piix 0000:00:07.1: Hyper-V Virtual Machine detected, ATA device ignoreset
[    3.199239] scsi host4: ata_piix
[    3.199325] scsi host5: ata_piix
[    3.199437] ata1: PATA max UDMA/33 cmd 0x1f0 ctl 0x3f6 bmdma 0xffa0 irq 14
[    3.199438] ata2: PATA max UDMA/33 cmd 0x170 ctl 0x376 bmdma 0xffa8 irq 15
[    3.199539] libphy: Fixed MDIO Bus: probed
[    3.199540] tun: Universal TUN/TAP device driver, 1.6
[    3.249865] PPP generic driver version 2.4.2
[    3.249917] i8042: PNP: PS/2 Controller [PNP0303:PS2K,PNP0f03:PS2M] at 0x60,0x64 irq 1,12
[    3.275939] serio: i8042 KBD port at 0x60,0x64 irq 1
[    3.276059] serio: i8042 AUX port at 0x60,0x64 irq 12
[    3.276148] mousedev: PS/2 mouse device common for all mice
[    3.276917] rtc_cmos 00:00: RTC can wake from S4
[    3.299681] rtc_cmos 00:00: rtc core: registered rtc_cmos as rtc0
[    3.301667] rtc_cmos 00:00: alarms up to one month, 114 bytes nvram
[    3.301746] device-mapper: uevent: version 1.0.3
[    3.301952] device-mapper: ioctl: 4.37.0-ioctl (2017-09-20) initialised: dm-devel@redhat.com
[    3.302653] NET: Registered protocol family 10
[    3.329222] Segment Routing with IPv6
[    3.329235] NET: Registered protocol family 17
[    3.329243] Key type dns_resolver registered
[    3.871524] input: AT Translated Set 2 keyboard as /devices/platform/i8042/serio0/input/input1
[    3.873136] ata2.00: ATAPI: Virtual CD, , max MWDMA2
[    3.877374] ata2.00: configured for MWDMA2
[    3.991979] sd 1:0:1:0: [sdb] 4096-byte physical blocks
[    4.020389] RAS: Correctable Errors collector initialized.
[    4.021087] sd 1:0:1:0: [sdb] Write Protect is off
[    4.081183] registered taskstats version 1
[    4.081407] sd 1:0:1:0: [sdb] Write cache: disabled, read cache: enabled, supports DPOand FUA
[    4.157232] Loading compiled-in X.509 certificates
[    4.186891] Loaded X.509 cert 'Build time autogenerated kernel key: 7273dfcd51ce17c3a0acc6c926ef1c75fa43a7eb'
[    4.245176] zswap: loaded using pool lzo/zbud
[    4.273162] Key type big_key registered
[    4.300113] Key type trusted registered
[    4.324545]  sdb: sdb1
[    4.340055] Key type encrypted registered
[    4.364670] AppArmor: AppArmor sha1 policy hashing enabled
[    4.398302] ima: No TPM chip found, activating TPM-bypass! (rc=-19)
[    4.435807] evm: HMAC attrs: 0x1
[    4.457283]   Magic number: 6:618:737
[    4.480269] acpi LNXCPU:66: hash matches
[    4.504411] sd 1:0:1:0: [sdb] Attached SCSI disk
[    4.532445] rtc_cmos 00:00: setting system clock to 2018-06-06 14:43:27 UTC (1528296207)
[    4.532588] BIOS EDD facility v0.16 2004-Jun-25, 0 devices found
[    4.532588] EDD information not available.
[    4.647109] Waiting 300 sec before mounting root device...
[    4.698924] scsi 5:0:0:0: CD-ROM            Msft     Virtual CD/ROM   1.0  PQ: 0 ANSI:5
[    4.762577] sr 5:0:0:0: [sr0] scsi3-mmc drive: 0x/0x tray
[    4.796251] cdrom: Uniform CD-ROM driver Revision: 3.20
[    4.828656] sr 5:0:0:0: Attached scsi generic sg2 type 5
[   65.149404] random: crng init done

It stays stuck there for a long time and then starts again:

[  242.652053] INFO: task swapper/0:1 blocked for more than 120 seconds.
[  242.692515]       Not tainted 4.13.0-1018-azure #21-Ubuntu
[  242.726189] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[  242.773035] swapper/0       D    0     1      0 0x80000000
[  242.804670] Call Trace:
[  242.819731]  __schedule+0x3d6/0x8b0
[  242.840169]  schedule+0x36/0x80
[  242.858942]  schedule_timeout+0x17a/0x360
[  242.882503]  ? call_timer_fn+0x140/0x140
[  242.905375]  ? printk+0x52/0x6e
[  242.924433]  msleep+0x2e/0x40
[  242.942405]  ? msleep+0x2e/0x40
[  242.961225]  prepare_namespace+0x2f/0x194
[  242.984905]  kernel_init_freeable+0x212/0x23b
[  243.010324]  ? rest_init+0xc0/0xc0
[  243.030406]  kernel_init+0xe/0x100
[  243.050270]  ret_from_fork+0x35/0x40
[  316.380120] md: Waiting for all devices to be available before autodetect
[  316.420720] md: If you don't use raid, use raid=noautodetect
[  316.455898] md: Autodetecting RAID arrays.
[  316.480434] md: autorun ...
[  316.497985] md: ... autorun DONE.
[  316.518345] EXT4-fs (sda1): couldn't mount as ext3 due to feature incompatibilities
[  316.561965] EXT4-fs (sda1): couldn't mount as ext2 due to feature incompatibilities
[  316.607491] EXT4-fs (sda1): mounted filesystem with ordered data mode. Opts: (null)
[  316.650540] VFS: Mounted root (ext4 filesystem) readonly on device 8:1.
[  316.688668] devtmpfs: mounted
[  316.709313] Freeing unused kernel memory: 2204K
[  316.736945] Write protecting the kernel read-only data: 18432k
[  316.771968] Freeing unused kernel memory: 2024K
[  316.800604] Freeing unused kernel memory: 508K
[  316.828909] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[  316.866924] x86/mm: Checking user space page tables
[  316.896878] x86/mm: Checked W+X mappings: passed, no W+X pages found.
[  317.011150] systemd[1]: Failed to insert module 'autofs4': No such file or directory
[  317.071828] systemd[1]: systemd 229 running in system mode. (+PAM +AUDIT +SELINUX +IMA+APPARMOR +SMACK +SYSVINIT +UTMP +LIBCRYPTSETUP +GCRYPT +GNUTLS +ACL +XZ -LZ4 +SECCOMP +BLKID +ELFUTILS +KMOD -IDN)
[  317.176700] systemd[1]: Detected virtualization microsoft.
[  317.209778] systemd[1]: Detected architecture x86-64.

Welcome to Ubuntu 16.04.4 LTS!

[  317.257767] systemd[1]: Set hostname to <aks-nodepool1-39245783-0>.
[  317.535817] systemd[1]: Created slice System Slice.
[  OK  ] Created slice System Slice.
[  317.589463] systemd[1]: Listening on Device-mapper event daemon FIFOs.
[  OK  ] Listening on Device-mapper event daemon FIFOs.
[  317.654823] systemd[1]: Reached target User and Group Name Lookups.
[  OK  ] Reached target User and Group Name Lookups.
[  317.718808] systemd[1]: Starting of Arbitrary Executable File Formats File System Automount Point not supported.
[UNSUPP] Starting of Arbitrary Executable Fi...m Automount Point not supported.
[  317.812413] systemd[1]: Reached target Encrypted Volumes.
[  OK  ] Reached target Encrypted Volumes.
[  317.866436] systemd[1]: Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on fsck to fsckd communication Socket.
[  OK  ] Listening on Journal Socket.
         Starting Set console keymap...
         Mounting Huge Pages File System...
[  OK  ] Listening on udev Kernel Socket.
[  OK  ] Listening on Syslog Socket.
         Starting Remount Root and Kernel File Systems...
[  OK  ] Started Trigger resolvconf update for networkd DNS.
         Mounting Debug File System...
         Starting Create Static Device Nodes in /dev...
         Starting Uncomplicated firewall...
[  OK  ] Listening on udev Control Socket.[  318.214815] EXT4-fs (sda1): re-mounted. Opts: discard

         Starting Load Kernel Modules...
         Starting Nameserver information manager...
[  OK  ] Created slice system-systemd\x2dfsck.slice.
         Mounting POSIX Message Queue File System...
[  OK  ] Listening on LVM2 metadata daemon socket.
         Starting Monitoring of LVM2 mirrors... dmeventd or progress polling...
[  OK  ] Listening on Journal Audit Socket.
[  OK  ] Reached target Swap.
[  OK  ] Listening on Journal Socket (/dev/log).
[  OK  ] Created slice system-serial\x2dgetty.slice.
[  OK  ] Created slice User and Session Slice.
[  OK  ] Reached target Slices.
[  OK  ] Listening on /dev/initctl Compatibility Named Pipe.
[  OK  ] Listening on LVM2 poll daemon socket.
[  OK  ] Started Forward Password Requests to Wall Directory Watch.
         Starting Journal Service...
[  OK  ] Mounted Huge Pages File System.
[  OK  ] Mounted Debug File System.
[  OK  ] Mounted POSIX Message Queue File System.
[  OK  ] Started Set console keymap.
[  OK  ] Started Remount Root and Kernel File Systems.
[  OK  ] Started Create Static Device Nodes in /dev.
[  OK  ] Started Uncomplicated firewall.
[FAILED] Failed to start Load Kernel Modules.
See 'systemctl status systemd-modules-load.service' for details.
[  OK  ] Started Journal Service.
[  OK  ] Started Nameserver information manager.
[  OK  ] Started LVM2 metadata daemon.
         Starting Apply Kernel Variables...
         Mounting Configuration File System...
         Mounting FUSE Control File System...
         Starting udev Kernel Device Manager...
         Starting Load/Save Random Seed...
         Starting udev Coldplug all Devices...
         Starting Initial cloud-init job (pre-networking)...
         Starting Flush Journal to Persistent Storage...
[  OK  ] Mounted Configuration File System.
[  OK  ] Mounted FUSE Control File System.
[  OK  ] Started udev Kernel Device Manager.
[  OK  ] Started Apply Kernel Variables.
[  OK  ] Started Load/Save Random Seed.
[  OK  ] Started Monitoring of LVM2 mirrors,...ng dmeventd or progress polling.
[  OK  ] Reached target Local File Systems (Pre).
[  OK  ] Reached target Local File Systems.
         Starting Set console font and keymap...
         Starting Tell Plymouth To Write Out Runtime Data...
         Starting LSB: ebtables ruleset management...
         Starting LSB: AppArmor initialization...
[  OK  ] Started udev Coldplug all Devices.
[  OK  ] Started Dispatch Password Requests to Console Directory Watch.
[  OK  ] Started Tell Plymouth To Write Out Runtime Data.
[  OK  ] Started Flush Journal to Persistent Storage.
         Starting Create Volatile Files and Directories...
[  321.610606] cloud-init[223]: Cloud-init v. 18.2 running 'init-local' at Wed, 06 Jun 2018 14:48:43 +0000. Up 320.80 seconds.
[  OK  [  342.565578] ] Started Create Volatile Files and Directories.cloud-init[223]: 2018-06-06 14:48:44,583 - stages.py[WARNING]: Failed to rename devices: [nic not present] Cannot rename mac=00:0d:3a:2a:15:be to eth0, not available.

[  OK  ] Started Initial cloud-init job (pre-networking).
[  OK  ] Started LSB: ebtables ruleset management.
[  OK  ] Started LSB: AppArmor initialization.
[  OK  ] Found device /dev/ttyS0.
[  OK  ] Found device Virtual_Disk 1.
         Starting File System Check on /dev/disk/cloud/azure_resource-part1...
[  OK  ] Listening on Load/Save RF Kill Switch Status /dev/rfkill Watch.
[  OK  ] Reached target Network (Pre).
         Starting Raise network interfaces...
         Starting Update UTMP about System Boot/Shutdown...
         Starting Network Time Synchronization...
[  OK  ] Started Network Time Synchronization.
[  OK  ] Started Set console font and keymap.
[  OK  ] Started File System Check on /dev/disk/cloud/azure_resource-part1.
[  OK  ] Started File System Check Daemon to report status.
[  OK  ] Created slice system-getty.slice.
[  OK  ] Reached target System Time Synchronized.
[  OK  ] Started Update UTMP about System Boot/Shutdown.
[FAILED] Failed to start Raise network interfaces.
See 'systemctl status networking.service' for details.
[  OK  ] Reached target Network.
         Starting Initial cloud-init job (metadata service crawler)...
[  371.248038] cloud-init[704]: Cloud-init v. 18.2 running 'init' at Wed, 06 Jun 2018 14:49:34 +0000. Up 371.04 seconds.
[  371.296939] cloud-init[704]: ci-info: +++++++++++++++++++++++Net device info++++++++++++++++++++++++
[  371.335293] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.377439] cloud-init[704]: ci-info: | Device |  Up  |  Address  |    Mask   | Scope | Hw-Address |
[  371.417594] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.455284] cloud-init[704]: ci-info: |   lo   | True | 127.0.0.1 | 255.0.0.0 |   .   |     .      |
[  371.497938] cloud-init[704]: ci-info: |   lo   | True |  ::1/128  |     .     |  host |     .      |
[  371.535779] cloud-init[704]: ci-info: +--------+------+-----------+-----------+-------+------------+
[  371.574200] cloud-init[704]: 2018-06-06 14:49:34,221 - stages.py[WARNING]: Failed to rename devices: [nic not present] Cannot rename mac=00:0d:3a:2a:15:be to eth0, not available.
[  OK  ] Started Initial cloud-init job (metadata service crawler).
[  OK  ] Reached target Network is Online.
         Starting iSCSI initiator daemon (iscsid)...
[  OK  ] Reached target System Initialization.
         Starting Docker Socket for the API.
         Starting Socket activation for snappy daemon.
[  OK  ] Started Daily Cleanup of Temporary Directories.
[  OK  ] Started Daily apt download activities.
[  OK  ] Listening on ACPID Listen Socket.
[  OK  ] Started Daily apt upgrade and clean activities.
[  OK  ] Reached target Timers.
[  OK  ] Started ACPI Events Check.
[  OK  ] Reached target Paths.
[  OK  ] Listening on D-Bus System Message Bus Socket.
         Starting LXD - unix socket.
[  OK  ] Listening on UUID daemon activation socket.
[  OK  ] Reached target Cloud-config availability.
         Mounting /mnt...
[  OK  ] Listening on Docker Socket for the API.
[  OK  ] Listening on Socket activation for snappy daemon.
[  OK  ] Listening on LXD - unix socket.
[  OK  ] Reached target Sockets.
[  OK  ] Reached target Basic System.
[  OK  ] Started D-Bus System Message Bus.
[  OK  ] Started Hyper-V VSS Protocol Daemon.
         Starting Snappy daemon...
[  OK  ] Started Regular background program processing daemon.
[  OK  ] Started Deferred execution scheduler.
         Starting OMI CIM Server...
[  OK  ] Started Name Service Cache Daemon.
         Starting LSB: Record successful boot for GRUB...
         Starting LSB: MD monitoring daemon...
         Starting Accounts Service...
[  OK  ] Started ACPI event daemon.
         Starting System Logging Service...
         Starting /etc/rc.local Compatibility...[  373.008043]
         Starting Login Service...
         Starting Docker Application Container Engine...
rc.local[928]:          Starting OpenBSD Secure Shell server.../etc/rc.local: 13: /etc/rc.local: cannot create /sys/bus/vmbus/drivers/hv_util/unbind: Directory nonexistent

[  OK  ] Started Hyper-V File Copy Protocol Daemon.
         Starting LXD - container startup/shutdown...
[  OK  ] Started Azure Linux Agent.
[  OK  ] Started Azure Linux Diagnostic Extension.
[  OK  ] Started Unattended Upgrades Shutdown.
         Starting Kubelet...
[  OK  ] Started FUSE filesystem for LXC.
         Starting Apply the settings specified in cloud-config...
[  OK  ] Started Hyper-V KVP Protocol Daemon.
[  OK  ] Mounted /mnt.
[  OK  ] Started System Logging Service.
[  OK  ] Started OpenBSD Secure Shell server.
[  OK  ] Started Snappy daemon.
[FAILED] Failed to start iSCSI initiator daemon (iscsid).
See 'systemctl status iscsid.service' for details.
[FAILED] Failed to start /etc/rc.local Compatibility.
See 'systemctl status rc-local.service' for details.
[  OK  ] Started OMI CIM Server.
[  OK  ] Started Login Service.
[  OK  ] Started LXD - container startup/shutdown.
         Starting Authenticate and Authorize Users to Run Privileged Tasks...
         Starting Login to default iSCSI targets...
[FAILED] Failed to start Login to default iSCSI targets.
See 'systemctl status open-iscsi.service' for details.
[  OK  ] Reached target Remote File Systems (Pre).
[  OK  ] Reached target Remote File Systems.
         Starting LSB: automatic crash report generation...
         Starting LSB: daemon to balance interrupts for SMP systems...
         Starting LSB: Set the CPU Frequency Scaling governor to "ondemand"...
         Starting Permit User Sessions...
[  OK  ] Started LSB: Set the CPU Frequency Scaling governor to "ondemand".
[  OK  ] Started Permit User Sessions.
[  OK  ] Started LSB: MD monitoring daemon.
[  OK  ] Started LSB: Record successful boot for GRUB.
[  OK  ] Started LSB: daemon to balance interrupts for SMP systems.
         Starting Terminate Plymouth Boot Screen...
         Starting Hold until boot process finishes up...
[  OK  ] Started Terminate Plymouth Boot Screen.
[  OK  ] Started Hold until boot process finishes up.
[  OK  ] Started Kubelet.
[  OK  ] Started LSB: automatic crash report generation.
[  OK  ] Started Authenticate and Authorize Users to Run Privileged Tasks.
[  OK  ] Started Accounts Service.
[  OK  ] Started Serial Getty on ttyS0.
         Starting Set console scheme...
[  OK  ] Started Getty on tty1.
[  OK  ] Reached target Login Prompts.
[  OK  ] Started Set console scheme.
2018/06/06 14:49:38.968673 INFO Azure Linux Agent Version:2.2.20
2018/06/06 14:49:38.994008 INFO OS: ubuntu 16.04
2018/06/06 14:49:39.019525 INFO Python: 3.5.2
2018/06/06 14:49:39.039976 INFO Run daemon
2018/06/06 14:49:39.063405 INFO Clean protocol
2018/06/06 14:49:39.091408 INFO Provisioning already completed, skipping.
2018/06/06 14:49:39.125127 INFO RDMA capabilities are not enabled, skipping
2018/06/06 14:49:39.170998 INFO Determined Agent WALinuxAgent-2.2.25 to be the latest agent
[  376.366044] cloud-init[998]: Cloud-init v. 18.2 running 'modules:config' at Wed, 06 Jun 2018 14:49:38 +0000. Up 375.38 seconds.
[  OK  ] Started Apply the settings specified in cloud-config.
         Starting Write warning to Azure ephemeral disk...
[  OK  ] Started Write warning to Azure ephemeral disk.
[  OK  ] Closed Syslog Socket.
         Stopping System Logging Service...
2018/06/06 14:49:39.825355 INFO Agent WALinuxAgent-2.2.25 is running as the goal state agent
2018/06/06 14:49:39.877901 INFO Detect protocol endpoints
2018/06/06 14:49:39.904565 INFO Clean protocol
2018/06/06 14:49:39.924020 INFO WireServer endpoint is not found. Rerun dhcp handler
2018/06/06 14:49:39.956053 INFO Test for route to 168.63.129.16
2018/06/06 14:49:39.993075 WARNING No route exists to 168.63.129.16
2018/06/06 14:49:40.019687 INFO Checking for dhcp lease cache
2018/06/06 14:49:40.046583 INFO looking for leases in path [/var/lib/dhcp/dhclient.*.leases]
2018/06/06 14:49:40.083278 INFO dhcp entry:168.63.129.16, 245:True, expired:True
2018/06/06 14:49:40.111679 INFO cached endpoint not found
2018/06/06 14:49:40.132472 INFO Cache exists [False]
2018/06/06 14:49:40.152060 INFO Send dhcp request
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
[DEPEND] Dependency failed for kubectl and kubelet extraction.
2018/06/06 14:49:40.329898 INFO Examine /proc/net/route for primary interface
2018/06/06 14:49:40.359284 WARNING Could not determine primary interface, please ensure /proc/net/route is correct
2018/06/06 14:49:40.398490 WARNING Contents of /proc/net/route:
Iface   Destination     Gateway         Flags   RefCnt  Use     Metric  Mask            MTU       Window  IRTT

2018/06/06 14:49:40.465284 WARNING Primary interface examination will retry silently
[  OK  ] Started Kubernetes systemd probe.
[  OK  ] Stopped System Logging Service.
         Starting Docker Application Container Engine...
[  OK  ] Stopped System Logging Service.
         Starting System Logging Service...
[  OK  ] Started System Logging Service.
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Docker Application Container Engine...
[FAILED] Failed to start Docker Application Container Engine.
See 'systemctl status docker.service' for details.
         Starting Docker Application Container Engine...

Ubuntu 16.04.4 LTS aks-nodepool1-39245783-0 ttyS0

aks-nodepool1-39245783-0 login: [  382.958632] cloud-init[3467]: Cloud-init v. 18.2 running 'modules:final' at Wed, 06 Jun 2018 14:49:45 +0000. Up 382.66 seconds.
[  382.976037] cloud-init[3467]: Cloud-init v. 18.2 finished at Wed, 06 Jun 2018 14:49:45+0000. Datasource DataSourceAzure [seed=/dev/sr0].  Up 382.95 seconds

Ubuntu 16.04.4 LTS aks-nodepool1-39245783-0 ttyS0
tomconte commented 6 years ago

I tried upgrading the kernel:

apt-get update
apt-get upgrade -f linux-azure

Then it re-boots OK. Not sure why :-| Will try the steps on another node.

tomconte commented 6 years ago

I just tried again with a new AKS cluster in eastus and did not see the problem anymore. I think the issue might have been due to some changes in the underlying Ubuntu image.

svollath commented 6 years ago

I also recreated an AKS cluster yesterday, and try to reproduce it. In fact it turned out again that esp. running update-grub isn't stable. I ended up with Node-0 being NotReady after restart, while it worked for Node-1 - the output of the resulting /boot/grub/grub.cfg was different for the two nodes, which is really strange:

First lines of grub.cfg on Node-0 (getting stuck):

opio; fi
                insmod part_msdos
                insmod ext2
                set root='hd0,msdos1'
                if [ x$feature_platform_search_hint = xy ]; then
                  search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1  84ce5b56-30b1-4479-a864-7296bb549cec
                else
                  search --no-floppy --fs-uuid --set=root 84ce5b56-30b1-4479-a864-7296bb549cec
                fi
                echo    'Loading Linux 4.15.0-1013-azure ...'
                linux   /boot/vmlinuz-4.15.0-1013-azure root=/dev/sda1 ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
        }

vs. first lines of grub.cfg on Node-1 (getting Ready):

ci0,msdos1  84ce5b56-30b1-4479-a864-7296bb549cec
                else
                  search --no-floppy --fs-uuid --set=root 84ce5b56-30b1-4479-a864-7296bb549cec
                fi
                echo    'Loading Linux 4.15.0-1013-azure ...'
                linux   /boot/vmlinuz-4.15.0-1013-azure root=UUID=84ce5b56-30b1-4479-a864-7296bb549cec ro  console=tty1 console=ttyS0 earlyprintk=ttyS0 rootdelay=300 swapaccount=1
                echo    'Loading initial ramdisk ...'
                initrd  /boot/initrd.img-4.15.0-1013-azure
        }

However, both files seem to be malformed (I read them with az vm run-command invoke -g $MCRGNAME -n aks-nodepool1-65006794-1 --command-id RunShellScript --scripts "sudo cat /boot/grub/grub.cfg").

svollath commented 6 years ago

When I tried the initial described commands yesterday (that was with kubernetes 1.9.9) they worked for me. I would like to change our documentation back to the safe solution of modifying "/etc/default/grub.d/50-cloudimg-settings.cfg" and run "update-grub" - however I'd somehow prefer a confirmation that images are considered fixed (from xy on) before.

Btw. here's a more safe way to add "swapaccount=1":

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo sed -i -r 's|^(GRUB_CMDLINE_LINUX_DEFAULT=)\"(.*.)\"|\1\"\2 swapaccount=1\"|' /etc/default/grub.d/50-cloudimg-settings.cfg"

> az vm run-command invoke -g <RGNAME> -n <VMNAME> --command-id RunShellScript --scripts "sudo update-grub"

> az vm restart -g <RGNAME> -n <VMNAME>
jnoller commented 5 years ago

Closing as stale.