OpenLightingProject / ola

The Open Lighting Architecture - The Travel Adaptor for the Lighting Industry
https://www.openlighting.org/ola/
Other
647 stars 205 forks source link

ola keeps messing with unrelated ttyUSB devices no matter how much I tell it not to #801

Closed DanielG closed 9 years ago

DanielG commented 9 years ago

In it's default configuration ola 0.9.1 from Debian jessie periodically takes over tty devices I use for debugging (I'm developing an RDM responder atm.). I had to disable all plugins except the one I'm using for my DMX interface (ENTTEC DMX USB PRO Mk2) and mess with udev to get it to stop.

The symptom of ola's hostile takeover is that the picocom running on my debug tty just closes on me every so often and I have to restart it, I don't remember the error off hand but I can revert my config and reproduce it if it's important.

For reference my /etc/udev/rules.d/90-enttec.rules

SUBSYSTEM=="tty", ENV{ID_VENDOR}=="ENTTEC", GROUP:="plugdev", MODE:="0660", SYMLINK+="enttec_%n"

and /etc/ola/ola-usbserial.conf

device_dir = /dev
device_prefix = enttec_
enabled = true
pro_fps_limit = 190
tri_use_raw_rdm = false
ultra_fps_limit = 40

Now this is all nice and well and but now I have another even stranger problem. Whenever I run ola_rdm_discover ola seems to somehow change the baud rate on a non enttec ttyUSB device!

I have to run stty <baud-rate> --file=/dev/ttyUSBX to get it to go back to normal.

This is quite annoying as I'm using that tty to debug a crash on my device that happens during discovery :/

DanielG commented 9 years ago

Of course now that I try again today I can't reproduce the second issue, so don't quote me on that but I am pretty sure the first one is an actual issue ;)

EDIT: Never mind that, I just managed to reproduce both of those issues.

nomis52 commented 9 years ago

Have you tried running the version in the git repo? There is support for UUCP lockfiles which should address this problem.

peternewman commented 9 years ago

@DanielG can you pastebin or similar an olad -l 4 of the discovery baud rate bug. If that is happening I doubt our UUCP locks will fix that, given its in theory a device you aren't opening.

DanielG commented 9 years ago

Sure I'll just leave that running in the background with some logging and report back when it happens again, yesterday I was doing while true; do ola_rdm_discover -u0 -f; done which seemed to trigger it every few minutes but now it seems to run fine. Maybe ola needs to have been running for a while to trigger it?

If that is happening I doubt our UUCP locks will fix that, given its in theory a device you aren't opening.

I am opening the device with picocom (which does UUCP locking AFAIK) so I guess that would fix it.

nomis52 commented 9 years ago

olad checks for new devices every 20s at which point it will open the device and perform an ioctl(). I suspect that's what you're seeing - it'll appear in the logs as well.

Simon

DanielG commented 9 years ago

I have seen this happen twice now. Looks like the baud rate glitch happens when ola crashes, so maybe a well meaning destructor somewhere that "resets" the baud rate of the tty even though it's not owned by ola.

My responder isn't exactly well behaved yet so It's probably not ola's fault for crashing ;)

[...]
SelectPoller.cpp:160: ss process time was 0.000000
DiscoveryAgent.cpp:223: Hit failure limit for (ff00:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB fe00:00000000 - feff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: fe00:00000000 - feff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (fe00:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB fc00:00000000 - fdff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: fc00:00000000 - fdff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (fc00:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB f800:00000000 - fbff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: f800:00000000 - fbff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (f800:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB f000:00000000 - f7ff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: f000:00000000 - f7ff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (f000:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB e000:00000000 - efff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: e000:00000000 - efff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (e000:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB c000:00000000 - dfff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: c000:00000000 - dfff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (c000:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB 8000:00000000 - bfff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: 8000:00000000 - bfff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000001
DiscoveryAgent.cpp:223: Hit failure limit for (8000:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:230: DUB 0000:00000000 - 7fff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
EnttecUsbProWidget.cpp:343: Sending DUB packet: 0000:00000000 - 7fff:ffffffff
SelectPoller.cpp:160: ss process time was 0.000002
DiscoveryAgent.cpp:223: Hit failure limit for (0000:00000000, ffff:ffffffff)
DiscoveryAgent.cpp:414: top of tree is corrupted
EnttecUsbProWidget.cpp:605: Enttec Pro discovery complete: 
Received SIGSEGV or SIGBUS
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(+0x8bb23)[0x7f710eaa8b23]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7f710d7a58d0]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rpc10RpcChannel7SendMsgEPNS0_10RpcMessageE+0x32)[0x7f710eb6f9d2]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rpc10RpcChannel15RequestCompleteEPNS0_18OutstandingRequestE+0x101)[0x7f710eb714e1]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(+0x9e741)[0x7f710eabb741]
/usr/lib/x86_64-linux-gnu/libolaserver.so.0(_ZN3ola8Universe17DiscoveryCompleteEPNS_13BaseCallback1IvRKNS_3rdm6UIDSetEEE+0xe8)[0x7f710ee3ea68]
/usr/lib/x86_64-linux-gnu/libolaserver.so.0(+0x54232)[0x7f710ee28232]
/usr/lib/x86_64-linux-gnu/libolaserver.so.0(+0x7107e)[0x7f710ee4507e]
/usr/lib/x86_64-linux-gnu/libolaserver.so.0(+0x5e3a2)[0x7f710ee323a2]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rdm33DiscoverableQueueingRDMController17DiscoveryCompleteERKNS0_6UIDSetE+0xa2)[0x7f710eb39b72]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(+0x11de71)[0x7f710eb3ae71]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro14EnttecPortImpl17DiscoveryCompleteEPNS_13BaseCallback1IvRKNS_3rdm6UIDSetEEEbS7_+0x56)[0x7f710ae89bb6]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(+0x3fd81)[0x7f710ae8bd81]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent13SendDiscoveryEv+0x1432)[0x7f710eb0bdf2]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent13SendDiscoveryEv+0xe2)[0x7f710eb0aaa2]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent14BranchCompleteEPKhj+0x13d)[0x7f710eb1088d]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro14EnttecPortImpl16HandleRDMTimeoutEj+0x16a)[0x7f710ae8830a]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro22EnttecUsbProWidgetImpl11HandleLabelEPNS1_14EnttecPortImplERKNS1_15OperationLabelsEhPKhj+0x118)[0x7f710ae89a38]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro22EnttecUsbProWidgetImpl13HandleMessageEhPKhj+0x60)[0x7f710ae89ae0]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro16BaseUsbProWidget14ReceiveMessageEv+0x26f)[0x7f710ae7ef5f]
/usr/lib/x86_64-linux-gnu/olad/libolausbpro.so.0(_ZN3ola6plugin6usbpro16BaseUsbProWidget15DescriptorReadyEv+0x18)[0x7f710ae7ef88]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola2io27BidirectionalFileDescriptor11PerformReadEv+0xe2)[0x7f710eab54c2]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola2io12SelectPoller16CheckDescriptorsEP6fd_setS3_+0x984)[0x7f710eabea84]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola2io12SelectPoller4PollEPNS0_14TimeoutManagerERKNS_12TimeIntervalE+0x17e)[0x7f710eabf4be]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola2io12SelectServer14CheckForEventsERKNS_12TimeIntervalE+0xe8)[0x7f710eaba968]
/usr/lib/x86_64-linux-gnu/libolacommon.so.0(_ZN3ola2io12SelectServer3RunEv+0xab)[0x7f710eabaa4b]
olad(main+0x691)[0x4032e1]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7f710d40eb45]
olad[0x403858]
nomis52 commented 9 years ago

It shouldn't crash, regardless of what the responder does.

Can you please try the latest code from the git repo. Some of these issues may have already been fixed.

DanielG commented 9 years ago

Sure, I'll try it tomorrow.

DanielG commented 9 years ago

Still happening with git master and the baud rate is mangled again.

plugins/usbpro/EnttecUsbProWidget.cpp:841: Enttec received label 12, length 0
common/rdm/DiscoveryAgent.cpp:211: BranchComplete, got 0
common/rdm/DiscoveryAgent.cpp:190: Hit failure limit for (c000:00000000, ffff:ffffffff)
common/rdm/DiscoveryAgent.cpp:197: DUB 8000:00000000 - bfff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
plugins/usbpro/EnttecUsbProWidget.cpp:331: Sending DUB packet: 8000:00000000 - bfff:ffffffff
common/io/EPoller.cpp:306: ss process time was 0.000002
plugins/usbpro/EnttecUsbProWidget.cpp:841: Enttec received label 12, length 0
common/rdm/DiscoveryAgent.cpp:211: BranchComplete, got 0
common/rdm/DiscoveryAgent.cpp:190: Hit failure limit for (8000:00000000, ffff:ffffffff)
common/rdm/DiscoveryAgent.cpp:197: DUB 0000:00000000 - 7fff:ffffffff, attempt 1, uids found: 0, failures 0, corrupted 0
plugins/usbpro/EnttecUsbProWidget.cpp:331: Sending DUB packet: 0000:00000000 - 7fff:ffffffff
common/io/EPoller.cpp:306: ss process time was 0.000002
plugins/usbpro/EnttecUsbProWidget.cpp:841: Enttec received label 12, length 0
common/rdm/DiscoveryAgent.cpp:211: BranchComplete, got 0
common/rdm/DiscoveryAgent.cpp:190: Hit failure limit for (0000:00000000, ffff:ffffffff)
common/rdm/DiscoveryAgent.cpp:387: Discovery tree is corrupted
plugins/usbpro/EnttecUsbProWidget.cpp:583: Enttec Pro discovery complete: 
Received Segmentation fault
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(+0x97ba3)[0x7fde6c3bdba3]
/lib/x86_64-linux-gnu/libpthread.so.0(+0xf8d0)[0x7fde6b0ac8d0]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rpc10RpcChannel7SendMsgEPNS0_10RpcMessageE+0x1f)[0x7fde6c4574af]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rpc10RpcChannel15RequestCompleteEPNS0_18OutstandingRequestE+0xe1)[0x7fde6c458f51]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(+0xab8f1)[0x7fde6c3d18f1]
/home/dxld/var/build/ola/olad/.libs/libolaserver.so.0(_ZN3ola8Universe17DiscoveryCompleteEPNS_13BaseCallback1IvRKNS_3rdm6UIDSetEEE+0xc8)[0x7fde6c75cdf8]
/home/dxld/var/build/ola/olad/.libs/libolaserver.so.0(+0x55c12)[0x7fde6c74dc12]
/home/dxld/var/build/ola/olad/.libs/libolaserver.so.0(+0x6b0ae)[0x7fde6c7630ae]
/home/dxld/var/build/ola/olad/.libs/libolaserver.so.0(+0x5b702)[0x7fde6c753702]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rdm33DiscoverableQueueingRDMController17DiscoveryCompleteERKNS0_6UIDSetE+0xa2)[0x7fde6c42b1e2]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(+0x106a41)[0x7fde6c42ca41]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro14EnttecPortImpl17DiscoveryCompleteEPNS_13BaseCallback1IvRKNS_3rdm6UIDSetEEEbS7_+0x47)[0x7fde68373a77]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(+0x40ca1)[0x7fde68375ca1]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent13SendDiscoveryEv+0x147d)[0x7fde6c3fd5cd]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent13SendDiscoveryEv+0xd9)[0x7fde6c3fc229]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola3rdm14DiscoveryAgent14BranchCompleteEPKhj+0xc0)[0x7fde6c401fc0]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro14EnttecPortImpl16HandleRDMTimeoutEj+0x14a)[0x7fde6837163a]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro22EnttecUsbProWidgetImpl11HandleLabelEPNS1_14EnttecPortImplERKNS1_15OperationLabelsEhPKhj+0xe0)[0x7fde68372e40]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro22EnttecUsbProWidgetImpl13HandleMessageEhPKhj+0xab)[0x7fde68372f2b]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro16BaseUsbProWidget14ReceiveMessageEv+0x24b)[0x7fde68367d8b]
/home/dxld/var/build/ola/plugins/usbpro/.libs/libolausbpro.so.0(_ZN3ola6plugin6usbpro16BaseUsbProWidget15DescriptorReadyEv+0x18)[0x7fde68367da8]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola2io27BidirectionalFileDescriptor11PerformReadEv+0xc2)[0x7fde6c3ca692]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola2io7EPoller15CheckDescriptorEP11epoll_eventPNS0_9EPollDataE+0x5b)[0x7fde6c3d67cb]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola2io7EPoller4PollEPNS0_14TimeoutManagerERKNS_12TimeIntervalE+0x223)[0x7fde6c3d6c03]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola2io12SelectServer14CheckForEventsERKNS_12TimeIntervalE+0xd8)[0x7fde6c3d0b98]
/home/dxld/var/build/ola/common/.libs/libolacommon.so.0(_ZN3ola2io12SelectServer3RunEv+0x83)[0x7fde6c3d0c43]
/home/dxld/var/build/ola/olad/.libs/lt-olad(main+0x6b5)[0x403565]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf5)[0x7fde6ad15b45]
/home/dxld/var/build/ola/olad/.libs/lt-olad[0x403741]
peternewman commented 9 years ago

Sorry @DanielG I meant if you've filtered to enttec_* then if it's still opening something else somehow, it may well be doing it in error and ignoring the UUCP too.

Can we see the whole olad log please (up to the start of a discovery, even if it doesn't crash will be sufficient)?

I'm guessing the callback is failing due to the corrupt tree: https://github.com/OpenLightingProject/ola/blob/master/plugins/usbpro/EnttecUsbProWidget.cpp#L585

DanielG commented 9 years ago

Sure, http://pastebin.com/F87qR83U. I got the responded to behave now so the discovery flood I was getting before isn't there anymore.

nomis52 commented 9 years ago

Should should be resolved with the latest changes in git. Please let us know if it isn't.