TritonDataCenter / illumos-kvm

KVM driver for illumos
Other
118 stars 65 forks source link

kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 #13

Open sirhelly opened 11 years ago

sirhelly commented 11 years ago

Issues: a) kernel warnings in full speed on the system console and syslogd, log files fill up b) dmesg hangs

SmartOS Version: SunOS 00-25-90-77-43-ac 5.11 joyent_20130307T214308Z i86pc i386 i86pc

Background Info: The system is running various different OS's (build&testserver) 52 VM's (KVM and Zones) including: FreeBSD8 32 Bit, FreeBSD9 32&64 Bit, OpenBSD 32&64 Bit, NetBSD 32/64 ReactOS, Haiku, Ubuntu, Oracle Solaris, OpenIndiana, Open Solaris (last) WinXP, Win7 64,Win Vista 64, Linux 32 & 64 Bit and some OS Zones.

Maybe related: Avi Kivity in 2007: http://www.mail-archive.com/kvm-devel@lists.sourceforge.net/msg07044.html

If i can help anyhow (cmd's, further infos, etc) please do not hesitate to contact me mail: helmut dot hartl at firmos dot at.

Messages for reference:

2013-06-07T15:04:02.293908+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.297239+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.300573+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.303907+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.307243+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.310578+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 2013-06-07T15:04:02.313911+00:00 00-25-90-77-43-ac kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012

ingenthr commented 8 years ago

I've just encountered this with 20160304T005100Z. System seems to hang pretty easily as well. Nothing in /var/crash.

kfr- commented 8 years ago

Just ran into this with joyent_20151104T185720Z. System seems to hang. Nothing in /var/crash.

rmustacc commented 8 years ago

On 3/28/16 18:30 , kfr- wrote:

Just ran into this with joyent_20151104T185720Z. System seems to hang. Nothing in /var/crash.

Did the host hang or the VM?

kfr- commented 8 years ago

Seems to hang the host.

I see 2016-03-28T19:28:41.000251+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 2016-03-28T19:28:41.000310+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 2016-03-28T19:28:41.000889+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 2016-03-28T19:28:41.000937+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 2016-03-28T19:28:41.001028+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 repeats from above lots

Then I see: 2016-03-28T19:28:41.854469+00:00 kam-srv1 kvm: [ID 177374 kern.warning] WARNING: kvm: emulating exchange as write#012 Then more: 2016-03-28T19:28:41.854998+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 2016-03-28T19:28:41.855015+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 2016-03-28T19:28:41.855938+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 2016-03-28T19:28:41.855956+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x530076 2016-03-28T19:28:41.856927+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x130076 repeats from above lots

Then before it hangs I see one last: 2016-03-28T19:43:36.678921+00:00 kam-srv1 kvm: [ID 987709 kern.info] unimplemented perfctr wrmsr: 0xc0010000 data 0x530076

I think I have a proper filter setup in /opt/custom/etc/rsyslog.d/kvm.conf :msg, contains, "unimplemented perfctr wrmsr" /var/adm/messages & ~

kfr- commented 8 years ago

If the filter is setup correctly, I should no longer see "unimplemented perfctr wrmsr" messages.

rmustacc commented 8 years ago

Well, if it's causing the host to hang, then can you generate an NMI when that happens and verify that you can't ping and use the console before that.

kfr- commented 8 years ago

I know I can't ping the console and I can't ssh into the host.

kfr- commented 8 years ago

Going to generate an NMI as per https://wiki.smartos.org/pages/viewpage.action?pageId=754743 the next time it happens. I will report back.

ingenthr commented 8 years ago

The host was hung in my case as well. I moved to lx brand VMs for now, but may need thost KVMs again at some point. I can try to get to another hang if it'd be helpful. My host doesn't have a BMC, so I'd need to see if it has a hardware NMI to get a crash dump or need to set a breakpoint somewhere useful. I've done this before but it's been a while, so I'm glad to get more data if it's useful.

davefinster commented 8 years ago

If it's of any usefulness, I just observed this on an old box running 20150123T200224Z that has its disks attached via a JBOD array that was suffering from both a very low available DRAM condition (mistakenly almost over-committed, as such free list showed <1GB available) and what appeared to be faulty SLOG SSDs exhibiting insanely high I/O latency. These SSDs are SATA Intel devices residing in a JBOD.

After a reboot, as VMs were being started, the host would eventually hang - keyboard I/O on consoles would still be accepted and be displayed, but nothing would make forward progress.

Removing the SLOG SSDs from service resolved the problem. Unfortunately I did not get a chance to inject an NMI/obtain a dump but if it happens again I'll be sure to give it a try.