beagleboard / kernel

Kernel for the beagleboard.org boards
174 stars 159 forks source link

BeBoPr cape DT overlay compiled in kernel is not fixed for new adc mapping. #55

Open modmaker opened 11 years ago

modmaker commented 11 years ago

The ADC mapping is still in the old format (adc-channels with count) and this causes a kernel null pointer reference during boot.

koenkooi commented 11 years ago

send a patch :)

modmaker commented 11 years ago

Just remove your patch "3.8: add BeBoPr support" for now. Once I think it's ready, I'll release it for inclusion. For now it's just causing trouble to have this compiled into the kernel. Loading the overlay from the /lib/firmware directory worked fine before.

koenkooi commented 11 years ago

yes, send a patch for that.

modmaker commented 11 years ago

Ah, that's the way it works.I'm getting the picture. It thought it would be easy for the person who created the mess to clean up. How do we prevent this situation in the future? Isn't it obvious thay I'm the maintainer of BeBoPr related code and that I sign off patches for kernel inclusion? That would prevent both of us wasting our time. In the meantime I'll think about a proper solution and create a patch.

ZubairLK commented 11 years ago

"ti,adc-channels = < 8 >; "

wont work.

Its

"ti,adc-channels = < 0 1 2 3 4 5 6 7 >;"

now..

modmaker commented 11 years ago

For some 3.8.13 kernels it is, for others it isn't :-( . The problem seems to be introduced by a change in the adc driver. Now there's a versioning / maintenance problem that I have to find a proper solution for. But the problem is not just with this overlay. How to prevent a future driver change (or error) to cause havoc with older overlays? I think this needs to be prevented somehow. Thus far overlays seemed rather kernel version independent. Maybe this is no longer safe. A long time ago, Linus refused to accept the loadable kernel modules exactly for this reason. That's why the modules versioning scheme was added.

ZubairLK commented 11 years ago

A set of patches was accepted in the mainline kernel that made this change.

This change was to make this tree closer to mainline to avoid a headache later on..

modmaker commented 11 years ago

Thanks for the explanation, that makes my headache almost bearable ;-)

But with this case as an example, future headaches are not avoided, they're only redirected to other people. IMHO some serious thinking has to be done about how to make this overlay feature future-proof!

ZubairLK commented 11 years ago

The fine folk in the mainline understand that..

Even a typo is scrutinized. :-1:

http://lkml.indiana.edu/hypermail/linux/kernel/1307.2/03826.html

modmaker commented 11 years ago

:-)

You seem familiar with the driver code: If I fix the overlay the kernel does not trap anymore (on a BBB). But I see that the same code (kernel and overlay) still traps on a BeagleBone (white). Instead of the incorrect PIN number, the cause now seems to be a minimum clock requirement not being met. The 3.2.24 kernel emits the same error during boot, also resulting in "probe of tsc failed with error -22" but does not oops. Any idea on how to fix this?

ZubairLK commented 11 years ago

I am unaware of BBWhite and what is in 3.2.24(without DT I think).

BBB and 3.8.13 are diffferent(with DT). I wouldn't mix them up. I am unaware how 3.8.13 functions on BBWhite.

Minimum clock requirement check was removed cause it wasn't needed...

See if this patch has been applied to your code or not..

http://arago-project.org/git/projects/?p=linux-am33x.git;a=commitdiff;h=333f3460f855abd02c7e7eefcb692740ecd9c662;hp=1be244a840f9a162e3c832d7c1686100a6096a7c

koenkooi commented 11 years ago

I don't get why people insist on the nonsense that BBW mean 3.2.x. The same kernel (3.8, 3.11) runs on all colours of the beaglebone

modmaker commented 11 years ago

Koen, I completely agree, but read my post: That's why I also tested your (3.8) kernel on a BBW. And it's because that same kernel did oops on the white, I also tested 3.2.24 on that white. And found the same error in the log, but no oops. So it looks like the problem/feature/bug resulting in "probe of tsc failed with error -22" has been present since 3.2.24. But the DT kernel seems less foregiving and generates a kernel trap while the 3.2 kernel continues with a (semi?) operational ADC.

I'll build my own kernel later today and see that I disable the check. To be continued...