TritonDataCenter / illumos-kvm

KVM driver for illumos
Other
119 stars 65 forks source link

System freezes after loading kvm module #1

Closed jasonbking closed 12 years ago

jasonbking commented 13 years ago

CPU is corei5 2400 (sandy bridge)

One time prior to a freeze, I did see 'kvm: NOTICE: unhanded wrmsr: 0x0 data 3000000018' on the console. However have not seen that since. Tried setting a bp in kvm_set_msr_common, and it appears to not be reached in subsequent lockups.

Disabling kvm leaves the system stable, doing an rem_drv kvm; add_drv kvm causes it to lockup shortly thereafter.

This is on a stock illumos debug build (source as of 8/26).

Also experienced similar issues w/ smartos live (though was never able to narrow it down).

bcantrill commented 13 years ago

Interesting. What guest? (Or does it hang without any guest at all?) Do you have a dump? And can you do this on the running system:

echo "vmcs_config::print" | mdb -k
jasonbking commented 13 years ago

No guests running -- just a regular boot, doesn't generate a dump, cannot drop to kmdb, tried "dtrace -wn 'tick-1m { panic(); }".

If I boot with -B disable-kvm=true, things are stable.. however when I 'rem_drv kvm; add_drv kvm' it freezes shortly thereafter (just like when I boot the BE normally) and I cannot drop to kmdb (this is also a DEBUG kernel)

So due to all of that, I set a breakpoint in setup_vmcs_config, and the output is immediately before it returns (hopefully this is sufficient, if not, let me know another point that would be more useful to return the value):

{ size = 0x400 order = 0 revision_id = 0x10 pin_based_exec_ctrl = 0x3f cpu_based_exec_ctrl = 0xb6a065fa cpu_based_2nd_exec_ctrl = 0xeb vmexit_ctrl = 0xf6fff vmentry_ctrl = 0x51ff }

jasonbking commented 13 years ago

Additional data points: set breakpoints on kvmkvm_{open,close,ioctl,devmap,segmap}. None are being hit prior to the system locking up. Also set a bp on kvmkvm_attach, that succeeds without any issue.

jasonbking commented 13 years ago

.. and it appears during the boot to be trying to unload the kvm module. setting a bp on kvm_detach gets triggered.

I stepped over each instruction, and after kvm_arch_hardware_unsetup is called, (or perhaps during), kmdb reports 'single-step stop on miscellaneous trap' and pc is within xc_serv. ::stack shows it's called as xc_serv(0, 0). Doing :c drops it back into xc_serv with the same message, after doing this several times, it drops back into the OS.

At this point, the system no longer locks up. (Uneducated guess) is the lockup perhaps a nasty interrupt deadlock triggered by kvm_arch_hardware_unsetup?

rmustacc commented 13 years ago

We finally have a box on hand to test this against. Our investigation shows that while the kvm driver is inducing it, there is a problem much deeper in the system. Basically the act of taking a spin lock in cross call context can lead to the behavior you're seeing. As a work around, on a sandy bridge system, consider setting apix_enable=0 in /etc/system or via mdb -kd. The issue is likely in the apix module which was taken in a not quite refined state when the source closed. We're going to be doing further work to determine what's going on there, but it'll be some time before we get there.

rmustacc commented 12 years ago

This has been resolved in illumos-joyent. See https://github.com/joyent/illumos-joyent/commit/4d86fb7f59410be72e467483b74e2eebff6052b2 for the fix.