Open danpawlik opened 1 year ago
@danpawlik have you able to reproduce it constantly?
Unfortunately yes.
Also I can see Running CRC on: VM
which means you are using nested virtualization setup which is not tested by us. Can you use https://github.com/crc-org/crc/wiki/Debugging-guide one and ssh to the VM and the check /var/lib/kubelet/config.json
file exist with your pull secret content (this is what this check do which is failing for you) ?
thanks @praveenkumar .
So on the VM, the file contains:
{}
so it was not copied.
I see a lot traceback in dmesg:
[ 880.704548] RIP: 0033:0x7f8b19a3ec6b
[ 880.704736] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[ 880.705596] RSP: 002b:00007f88b7ffe4a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 880.705966] RAX: ffffffffffffffda RBX: 00007f88c4ff8e50 RCX: 00007f8b19a3ec6b
[ 880.706317] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001c
[ 880.706684] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff
[ 880.707052] R10: 00007f88b0051580 R11: 0000000000000246 R12: 000055b9750bf620
[ 880.707412] R13: 00007f88c4ff8ff0 R14: 9ebdce38c25acb00 R15: 00007f88c4ff8e48
[ 880.707754] </TASK>
[ 880.707887] Call Trace:
[ 880.708010] <TASK>
[ 880.708116] x86_pmu_stop+0x50/0xb0
[ 880.708289] x86_pmu_del+0x73/0x190
[ 880.708463] event_sched_out.part.0+0x7a/0x1f0
[ 880.708679] group_sched_out.part.0+0x93/0xf0
[ 880.708898] ctx_sched_out+0x124/0x2a0
[ 880.709083] perf_event_context_sched_out+0x1a5/0x460
[ 880.709329] __perf_event_task_sched_out+0x50/0x170
[ 880.709572] ? pick_next_task+0x51/0x940
[ 880.709766] prepare_task_switch+0xbd/0x2a0
[ 880.709997] __schedule+0x1cb/0x620
[ 880.710172] schedule+0x5a/0xc0
[ 880.710331] xfer_to_guest_mode_handle_work+0xac/0xe0
[ 880.710578] vcpu_run+0x1f5/0x250 [kvm]
[ 880.710801] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm]
[ 880.711079] kvm_vcpu_ioctl+0x271/0x670 [kvm]
[ 1898.648039] RIP: 0033:0x7f8b19a3ec6b
[ 1898.648213] Code: 73 01 c3 48 8b 0d b5 b1 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0
ff ff 73 01 c3 48 8b 0d 85 b1 1b 00 f7 d8 64 89 01 48
[ 1898.649089] RSP: 002b:00007f88c57f84a8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
[ 1898.649448] RAX: ffffffffffffffda RBX: 00007f88c5ffae50 RCX: 00007f8b19a3ec6b
[ 1898.649790] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000001b
[ 1898.650141] RBP: 0000000000000000 R08: 0000000000000000 R09: 00000000000000ff
[ 1898.650481] R10: 00007f88b0051580 R11: 0000000000000246 R12: 000055b9750af730
[ 1898.650815] R13: 00007f88c5ffaff0 R14: 9ebdce38c25acb00 R15: 00007f88c5ffae48
[ 1898.651154] </TASK>
[ 1898.651824] Call Trace:
[ 1898.652155] <TASK>
[ 1898.652401] amd_pmu_enable_all+0x44/0x60
[ 1898.652851] __perf_install_in_context+0x16c/0x220
[ 1898.653372] remote_function+0x47/0x50
[ 1898.653781] generic_exec_single+0x78/0xb0
[ 1898.654254] smp_call_function_single+0xeb/0x130
[ 1898.654569] ? sw_perf_event_destroy+0x60/0x60
[ 1898.654871] perf_install_in_context+0xcf/0x200
[ 1898.655173] ? ctx_resched+0xe0/0xe0
[ 1898.655416] perf_event_create_kernel_counter+0x114/0x180
[ 1898.655776] pmc_reprogram_counter.constprop.0+0xec/0x220 [kvm]
[ 1898.656230] amd_pmu_set_msr+0x106/0x170 [kvm_amd]
[ 1898.656562] ? __svm_vcpu_run+0x67/0x110 [kvm_amd]
[ 1898.656898] ? get_gp_pmc_amd+0x129/0x200 [kvm_amd]
[ 1898.657235] __kvm_set_msr+0x7f/0x1c0 [kvm]
[ 1898.657567] kvm_emulate_wrmsr+0x52/0x1b0 [kvm]
[ 1898.657923] vcpu_enter_guest+0x667/0x1010 [kvm]
[ 1898.658277] ? kvm_get_rflags+0xe/0x30 [kvm]
[ 1898.658606] ? svm_get_if_flag+0x1d/0x50 [kvm_amd]
[ 1898.658931] ? kvm_apic_has_interrupt+0x32/0x90 [kvm]
[ 1898.659311] ? kvm_cpu_has_interrupt+0x60/0x80 [kvm]
[ 1898.659681] vcpu_run+0x33/0x250 [kvm]
[ 1898.659977] kvm_arch_vcpu_ioctl_run+0x104/0x620 [kvm]
[ 1898.660365] kvm_vcpu_ioctl+0x271/0x670 [kvm]
[ 1898.660702] ? __seccomp_filter+0x45/0x470
The odd think here is, that on Centos 8 Stream is working normally (same hypervisor that got AMD CPU, just instance has been rebuilt).
I have done one more test on different Cloud Provider with same image and the result is.... it is working normally (but there was Intel CPU).
I will try to dig more, what is breaking the crc start there. Maybe it would be helpful to others that got same issue.
Workaround, ansible-playbook:
---
# This playbook deploy crc and prepare VM to make a snapshot, that later
# can be deployed in CI.
- hosts: crc.dev
become: true
tasks:
- name: Install packages
yum:
name:
- qemu-kvm-common
state: present
- name: Ensure CentOS runs with selinux permissive
selinux:
policy: targeted
state: permissive
- name: Enable nested virtualization
lineinfile:
path: /etc/modprobe.d/kvm.conf
regexp: '^#options kvm_amd nested=1'
line: 'options kvm_amd nested=1'
# From https://lore.kernel.org/lkml/20220830235537.4004585-8-seanjc@google.com/T/
- name: Disable ept
shell: |
sed -i 's/net.ifnames=0/net.ifnames=0 ept=0/g' /etc/default/grub
- name: Regenerate grub
shell: |
grub2-mkconfig -o /boot/grub2/grub.cfg
# REBOOT HOST.
After applying playbook, it seems that it move forward. IMO the issue is not on crc side, but kvm/libvirt.
Still it does not deploy CRC. Created bug for kvm https://bugzilla.redhat.com/show_bug.cgi?id=2151878
For what it's worth, there has been multiple similar reports in the past https://github.com/crc-org/crc/issues/3366#issuecomment-1264304842 https://github.com/crc-org/crc/issues/1830
(searching closed issues for "AMD Intel" might give more results)
Marking as unsupported. This is not something we can resolve as this is relate to nested virtualization and the 'incompatiblity' (read existing bugs: https://marc.info/?l=kvm&m=166886061623174&w=2) with some AMD Ryzen/Epyc CPUs.
Workaround on Centos 9 Stream: install kernel from elrepo.org.
Steps:
General information
crc setup
before starting it (Yes/No)? - YesCRC version
CRC config
Host Operating System
Steps to reproduce
Expected
CRC will start normally.
Actual
Logs
https://gist.github.com/danpawlik/608fd45ce9e8642ce43baace625575d4
Before gather the logs try following if that fix your issue
Also it does not help.