ShabbyX / RTAI

(NO LONGER MAINTAINED) Clone of RTAI from https://www.rtai.org
28 stars 17 forks source link

crash in ipipe_timer_name() with 3.4.53 #12

Closed SebKuzminsky closed 10 years ago

SebKuzminsky commented 11 years ago

A user reported this crash in private email:

rtai-crash

He's using the 3.4.53-1-rtai kernel I compiled, the debs for that live here: http://highlab.com/~seb/linuxcnc/rtai-for-3.4-prerelease

The linux kernel config is this: http://highlab.com/~seb/linuxcnc/rtai-for-3.4-prerelease/config-3.4.53-1-rtai

I've asked him for dmesg and /proc/cpuinfo, will update this issue when it comes in.

The OP admits to having gone through a bit of a wonky install process, but in the end managed to get the kernel booted and had this crash as soon as he started realtime.

NTULINUX commented 11 years ago

Do you (or anyone) have the same problem with that kernel config? Has anyone else tested that config? I might be responsible for this bug.. If its an issue for other users I might be able to fix it.

NTULINUX commented 11 years ago

I remember sending you a better RTAI kernel config to use with LinuxCNC as opposed to the one directly based off of Ubuntu.. Please use that one instead, then see if the issue goes away.

SebKuzminsky commented 11 years ago

I have successfully run this kernel on a single-core P4 and a newer dual-core machine.

steffenmauch commented 11 years ago

@NTULINUX could you add such a (better) RTAI kernel config for 3.4 or even 3.8?

ShabbyX commented 11 years ago

I had a similar oops with 3.8 in a recent email I sent to the mailing list. For me, the problem arose when I enabled 3D acceleration in the virtual machine. In my case however, there was an IPIPE error regarding the timers not existing or something. My suspicion is that in the case there is an error with the timers, there's a division by zero or something.

I have yet to test disabling the No APIC option of the virtual machine. Either way, it would be interesting to have that test against division by zero. But it's just a hunch, the problem may well be somewhere else.

Exact message was (notice that RTAI reports clock frequencies to be zero):

[   18.548499] I-pipe: high-resolution clock not working
[   18.548499] I-pipe: head domain RTAI registered.
[   18.548499] RTAI[hal]: compiled with gcc version 4.6.3 (Ubuntu/Linaro 4.6.3-1ubuntu5) .
[   18.552450] RTAI[hal]: mounted (IPIPE-NOTHREADS, IMMEDIATE (INTERNAL IRQs VECTORED), ISOL_CPUS_MASK: 0).
[   18.552450] SYSINFO: CPUs 2, LINUX APIC IRQ -1, TIM_FREQ 0, CLK_FREQ 0, CPU_FREQ 0
[   18.552450] RTAI_APIC_TIMER_IPI: RTAI DEFINED 2306, VECTOR 2306; LINUX_APIC_TIMER_IPI: RTAI DEFINED 2304, VECTOR 2304
[   18.552450] BUG: unable to handle kernel NULL pointer dereference at 00000014
[   18.552450] IP: [<c10cc0c7>] ipipe_timer_name+0x17/0x20
[   18.552450] *pdpt = 000000003438a001 *pde = 0000000000000000
[   18.552450] Oops: 0000 [#1] SMP
[   18.552450] Modules linked in: rtai_hal(O+) ppdev parport_pc microcode psmouse vboxguest(O) i2c_piix4 serio_raw lp parport e1000
[   18.552450] Pid: 931, comm: insmod Tainted: G        W  O 3.8.13-rtai #3 innotek GmbH VirtualBox/VirtualBox
[   18.564845] EIP: 0060:[<c10cc0c7>] EFLAGS: 00010282 CPU: 1
[   18.564845] EIP is at ipipe_timer_name+0x17/0x20
[   18.564845] EAX: 00000000 EBX: 00000000 ECX: f4be3dcc EDX: 35b0b000
[   18.564845] ESI: 00000000 EDI: 03a982ee EBP: f4be3e58 ESP: f4be3df0
[   18.564845]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
[   18.564845] CR0: 8005003b CR2: b7c1d7a8 CR3: 343aa000 CR4: 000006f0
[   18.564845] DR0: 00000000 DR1: 00000000 DR2: 00000000 DR3: 00000000
[   18.564845] DR6: ffff0ff0 DR7: 00000400
[   18.564845] I-pipe domain Linux
[   18.564845] Process insmod (pid: 931, ti=f4be2000 task=f3e70cd0 task.ti=f4be2000)
[   18.564845] Stack:
[   18.564845]  f9ad433a f9ad581c 00000902 00000902 00000900 00000900 00000000 00000000
[   18.564845]  00000000 00000000 00000002 ffffffff 00000000 00000000 00000000 00000000
[   18.564845]  00000000 00000000 9ac15d93 f9ad5492 ffffffff f9ad2280 00000000 00003afc
[   18.588053] Call Trace:
[   18.588053]  [<f9ad433a>] ? __rtai_hal_init+0x2da/0x380 [rtai_hal]
[   18.588053]  [<f9ad2280>] ? rt_printk+0x60/0x60 [rtai_hal]
[   18.588053]  [<c1001144>] do_one_initcall+0x34/0x180
[   18.588053]  [<f9ad4060>] ? ack_bad_irq+0x40/0x40 [rtai_hal]
[   18.588053]  [<c109d7be>] load_module+0x1c1e/0x2320
[   18.588053]  [<c109df46>] sys_init_module+0x86/0xa0
[   18.588053]  [<c157d396>] sysenter_do_call+0x12/0x16
[   18.588053] Code: 8b 75 f8 8b 7d fc 89 ec 5d c3 8d 76 00 8d bc 27 00 00 00 00 55 89 e5 66 66 66 66 90 8b 15 80 e0 81 c1 b8 38 d2 8c c1 5d 8b 04 10 <8b> 40 14 c3 90 8d 74 26 00 55 89 e5 83 ec 18 89 5d f4 89 75 f8
[   18.588053] EIP: [<c10cc0c7>] ipipe_timer_name+0x17/0x20 SS:ESP 0068:f4be3df0
[   18.588053] CR2: 0000000000000014
[   18.590126] ---[ end trace 5f98583cd35a704c ]---
ShabbyX commented 11 years ago

@SebKuzminsky tell that user to not load the RTAI modules at startup, but instead manually. Therefore, before loading the modules he can take a look at the kernel messages to see if there were similar reports from IPIPE.

ShabbyX commented 11 years ago

@NTULINUX I committed a test on the timer inside __rtai_hal_init. The commit can be found in the dev branch. Can you please test it? It compiles all right and here it works fine. I have to get home to test it against the setup which shows this oops.

@SebKuzminsky in the meantime, you could also get your user to test the dev branch. In that case, probably RTAI wouldn't load complaining that there are no clocks available. If the oops remains, probably the error is somewhere else.

Seb-LineRate commented 11 years ago

Here are some files from the user's system.

/proc/cpuinfo: http://pastebin.ca/2428439 dmesg: http://pastebin.ca/2428440 kern.log: http://pastebin.ca/2428444

It's a 4-cpu system (according to cpuinfo & dmesg), and rtai emits this error before crashing: "I-pipe: could not find timer for cpu #2"

I'm out for the next couple of weeks, I will probably not be able to help the user with testing this until the end of the month.

ShabbyX commented 11 years ago

Ok then the new commit would definitely avoid the oops, since it would make RTAI complain that timers are missing. It wouldn't solve the original problem that the timer is missing!

NTULINUX commented 11 years ago

@steffenmauch Please refer to the README.INSTALL file.

ShabbyX commented 10 years ago

I'm closing this issue. I can apply the patch that checks for no timers find to avoid the oops (and fail with a message), but in the end it doesn't solve the real problem and RTAI won't work anyway. Furthermore, it's rather ambiguous what should be done if only some of the timers don't exist and I don't know which of them would be really problematic and which not.

So for now, unless this becomes a more common issue, I'd leave it be as it is.