baronbrew / TILTpi

Baron Brew Equipment Official Tilt app for Raspberry Pi
62 stars 18 forks source link

Bad BLE Packets Cause Crash #23

Open lbussy opened 4 years ago

lbussy commented 4 years ago

I have forked the Aioblescan library and I noticed that on occasion, a "bad packet" will happen, decode() will fail, and any client which has instantiated the class will raise an exception. This may or may not be related to the baud rate issues.

I added a small try:/except: to my fork and just wanted to share it with you here in case you choose to incorporate it here. If you like I can do a PR.

There's no issues board on the Aioblescan repo so dropping it here.

lbussy commented 4 years ago

Well, apparently that did not fix all of it - trying to figure it out. I don't see any errors in the syslog around that time but the stacktrace was:

  File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 966, in decode
    data=ev.decode(data)
  File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1007, in decode
    data=ev.decode(data)
  File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1033, in decode
    data=x.decode(data)
  File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 298, in decode
    self.val= unpack(">B",data[:1])[0]
struct.error: unpack requires a buffer of 1 bytes
noahbaron commented 4 years ago

Thanks for looking into it. Looks like it’s losing bytes on some of the scans.

On Mon, Apr 13, 2020 at 2:13 PM Lee Bussy notifications@github.com wrote:

Well, apparently that did not fix all of it - trying to figure it out. I don't see any errors in the syslog around that time but the stacktrace was:

File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 966, in decode data=ev.decode(data) File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1007, in decode data=ev.decode(data) File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1033, in decode data=x.decode(data) File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 298, in decode self.val= unpack(">B",data[:1])[0] struct.error: unpack requires a buffer of 1 bytes

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-613100775, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBP2RPN3K7HYQB3TGKLRMN56VANCNFSM4MHCMRNA .

lbussy commented 4 years ago

Yes, quite irritating. It's infrequent (12-24 hour frequency), so challenging to catch in the act.

noahbaron commented 4 years ago

It seems the more Tilts or bluetooth devices in range, the more often they occur. In my test environment it actually occurred quite often.

On Mon, Apr 13, 2020 at 3:31 PM Lee Bussy notifications@github.com wrote:

Yes, quite irritating. It's infrequent (12-24 hour frequency), so challenging to catch in the act.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-613129005, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBNZVWSHZYLIS6AFZL3RMOHC7ANCNFSM4MHCMRNA .

lbussy commented 4 years ago

Hi Noah. I have been doing some more testing. As discussed in #14, the baud rate seems to be tied to this. It's still anecdotal, but in the Fermentrack project folks have noticed the same thing.

I did a little online/curl script to drop the baud rate to 115200, and at least initially that seems to have helped a couple of folks. Should anyone passing by here want to run it (this is definitely not a Tilt sponsored/controlled/approved item) you can do so with this command:

curl -L uartspeed.brewpiremix.com | sudo bash

It will run as root as typed. If you leave off the sudo part it will automatically restart with sudo. If you don't like that, don't run it.

The real issue for those of us who have incorporated it into our projects is that aioblescan will pretty much quietly tank and that generally means the downstream client will stop reporting until restarted. I suppose we can always check for it and restart the connection, but that seems unglamorous. :)

On one of my test rigs which happens to be a Pi 2B, it happens a bit more often. I pinged Marcus and he's going to help out as well. My theory is that a 2B with multiple Tilts and a higher baud rate will fail often allowing me to learn a little more.

noahbaron commented 4 years ago

Lee,

Thanks for the update and persistence in getting this solved. One thing I can say is Aioblescan does emit an error with a detailed description. This is how I originally worked around the problem. Essentially I would kill the process when an error occurred and restart the process. It worked well, but the baud rate fix was much more reliable, I do both now and seems to work around the error fairly well. When I get the error again, I’ll send you a copy of the output.

Also, we can send you a few of our test Tilts you can use to test on your end. I’ll talk to Marcus on getting that setup.

Regards,

On Fri, Apr 17, 2020 at 6:10 AM Lee Bussy notifications@github.com wrote:

Hi Noah. I have been doing some more testing. As discussed in #14 https://github.com/baronbrew/TILTpi/issues/14, the baud rate seems to be tied to this. It's still anecdotal, but in the Fermentrack project https://github.com/thorrak/fermentrack/issues/414 folks have noticed the same thing.

I did a little online/curl script https://github.com/brewpi-remix/brewpi-tools-rmx/blob/master/uartspeed.sh to drop the baud rate to 115200, and at least initially that seems to have helped a couple of folks. Should anyone passing by here want to run it (this is definitely not a Tilt sponsored/controlled/approved item) you can do so with this command:

curl -L uartspeed.brewpiremix.com | sudo bash

It will run as root as typed. If you leave off the sudo part it will automatically restart with sudo. If you don't like that, don't run it.

The real issue for those of us who have incorporated it into our projects is that aioblescan will pretty much quietly tank and that generally means the downstream client will stop reporting until restarted. I suppose we can always check for it and restart the connection, but that seems unglamorous. :)

On one of my test rigs which happens to be a Pi 2B, it happens a bit more often. I pinged Marcus and he's going to help out as well. My theory is that a 2B with multiple Tilts and a higher baud rate will fail often allowing me to learn a little more.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-615235229, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBM3MDJCJYLRL5LVUETRNBILTANCNFSM4MHCMRNA .

lbussy commented 4 years ago

I did exchange email with Marcus - he was going to send one. I appreciate you folks supporting Open Source!

lbussy commented 4 years ago

One thing I can say is Aioblescan does emit an error with a detailed description.

Noah, I was poking through the aioblescan lib and I'm not quite sure how you managed the error/detailed description. Would you give me a hint?

noahbaron commented 4 years ago

The aioblescan python module will output an error to stderr. For example it will output the following when Bluetooth stops scanning:

"Fatal error: protocol.data_received() call failed.↵protocol: <aioblescan.aioblescan.BLEScanRequester object at 0x760b3070>↵transport: <_SelectorSocketTransport fd=6 read=polling write=<idle, bufsize=0>>↵Traceback (most recent call last):↵ File "/usr/lib/python3.7/asyncio/selector_events.py", line 813, in _read_readydata_received↵ self._protocol.data_received(data)↵ File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1272, in data_received↵ self.process(packet)↵ File "/usr/local/lib/python3.7/dist-packages/aioblescan/main__.py", line 74, in my_process↵ xx=ev.decode(data)↵ File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 963, in decode↵ data=ev.decode(data)↵ File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1004, in decode↵ data=ev.decode(data)↵ File "/usr/local/lib/python3.7/dist-packages/aioblescan/aioblescan.py", line 1046, in decode↵ xx=myinfo.decode(data[:length.val-len(code)]..."

After seeing this error I kill the aioblescan process and restart it. I'm using node-red to manage the command line outputs and inputs. There's probably a better way to handle the error within aioblescan so it doesn't need to be killed and restarted, but it's not obvious where that would be done.

On Fri, Aug 7, 2020 at 6:43 PM Lee Bussy notifications@github.com wrote:

One thing I can say is Aioblescan does emit an error with a detailed description.

Noah, I was poking through the aioblescan lib and I'm not quite sure how you managed the error/detailed description. Would you give me a hint?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-670805576, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBJNNHTWUHHAWEPPCMTR7SUS3ANCNFSM4MHCMRNA .

lbussy commented 4 years ago

That makes sense. I completely over-thought that one. :)

Thank you for the reply.

noahbaron commented 4 years ago

Sure! If you discover any solutions let me know. Not sure if I fully understand where the fatal error actually occurred. Maybe aioblescan can restart the scanning process on its own.

On Sat, Aug 8, 2020 at 3:06 PM Lee Bussy notifications@github.com wrote:

That makes sense. I completely over-thought that one. :)

Thank you for the reply.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-670978823, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBJBGJ7U3543XUSSEXTR7XD45ANCNFSM4MHCMRNA .

lbussy commented 3 years ago

Hey Noah, closing this up a bit in case it helps you:

First, I moved BPR back to the original aioblescan lib. I figured in case someone else had some time to fix it, that way I would benefit from the updates.

Next, I created a shim between BPR and aioblescan, allowing me to use the "stock" lib with my work.

Within that shim is a very simple try/except on the callback to (hopefully) catch most of those more gracefully. If I get crazy, I can use that to restart the scanning internally.

Finally, within BPR, if I don't get a value (which is how it ends up manifesting itself) and I expect one, I restart the Tilt scanner.

So far so good on this. I hope to move it to my master branch soon.

noahbaron commented 3 years ago

Lee,

Sounds like a good plan and workaround. Good to know it's working well. I'll let you know if I discover anything regarding this bug.

Cheers,

Noah

On Thu, Oct 15, 2020 at 7:17 AM Lee Bussy notifications@github.com wrote:

Hey Noah, closing this up a bit in case it helps you:

First, I moved BPR back to the original aioblescan lib. I figured in case someone else had some time to fix it, that way I would benefit from the updates.

Next, I created a shim https://github.com/brewpi-remix/brewpi-script-rmx/blob/tilt/Tilt.py between BPR and aioblescan, allowing me to use the "stock" lib with my work.

Within that shim is a very simple try/except https://github.com/brewpi-remix/brewpi-script-rmx/blob/118e471ca8a0cbe3ee8c02beec82b4944830fc56/Tilt.py#L188-L201 on the callback to (hopefully) catch most of those more gracefully. If I get crazy, I can use that to restart the scanning internally.

Finally, within BPR, if I don't get a value (which is how it ends up manifesting itself) and I expect one, I restart the Tilt scanner https://github.com/brewpi-remix/brewpi-script-rmx/blob/118e471ca8a0cbe3ee8c02beec82b4944830fc56/brewpi.py#L1514-L1544 .

So far so good on this. I hope to move it to my master branch soon.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/baronbrew/TILTpi/issues/23#issuecomment-709356328, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAQ3FBOD3NORQDTNTJJRQB3SK376LANCNFSM4MHCMRNA .