JsBergbau / MiTemperature2

Read the values of the Xiaomi Mi Bluetooth Temperature sensor 2 including custom encrypted format.
706 stars 162 forks source link

Script execution stucks after some succesfull connections to device #21

Closed BlueMorph closed 4 years ago

BlueMorph commented 4 years ago

Hi @JsBergbau ,

your script is perfect and I used it to show the climate-values of some rooms in my house in my home assistant. I work on a Raspberry PI 3B and the system works very well with only one sensor and with a permanent BLE-connection to the device.

I simply call your script like this LYWSD03MMC.py -d A4:C1:38:46:B7:2D -r -b.

First idea to to reduce the connection frequency was to use a simple cron job. With this job I want to call your script every minute (or every 5 minutes). Unfortunately, after some connections something get stucks. I try to analyse your script and I think I found some hints documented from you in your source code. You commented some thoughts about the bluepy-helper functions and that a stucking bluepy-helper can be killed with a system call before reconnecting to devices. I activate this parts of sourcecode and try again. But same error appears. After some connections something freezes. I checked my journal. There are following messages:

Mär 22 08:25:11 BennyPi blemqtt.sh[31954]: Trying to connect to A4:C1:38:82:2D:E3 Mär 22 08:25:11 BennyPi blemqtt.sh[31954]: Connection lost Mär 22 08:25:12 BennyPi blemqtt.sh[31954]: Waiting... Mär 22 08:25:12 BennyPi blemqtt.sh[31954]: bluepy-helper couldn't be determined, killing not allowed Mär 22 08:25:12 BennyPi blemqtt.sh[31954]: Trying to connect to A4:C1:38:82:2D:E3 Mär 22 08:25:12 BennyPi blemqtt.sh[31954]: Connection lost Mär 22 08:25:13 BennyPi blemqtt.sh[31954]: Waiting... Mär 22 08:25:13 BennyPi blemqtt.sh[31954]: bluepy-helper couldn't be determined, killing not allowed Mär 22 08:25:13 BennyPi blemqtt.sh[31954]: Trying to connect to A4:C1:38:82:2D:E3 Mär 22 08:25:13 BennyPi blemqtt.sh[31954]: Connection lost

I google some time and found two clues I would like to follow.

1.) Restart the bluetooth service 2.) Activating the watchdog and the restart on-failure capability in the bluetooth service
3.) Set the --compatargument to the bluetooth service

If I reset the bluetooth-service with sudo systemctl restart bluetooth.service the connection to my device could be established.

Mär 22 08:30:23 BennyPi blemqtt.sh[31954]: Trying to connect to A4:C1:38:82:2D:E3 Mär 22 08:30:29 BennyPi blemqtt.sh[31954]: Temperature: 17.2 Mär 22 08:30:29 BennyPi blemqtt.sh[31954]: Humidity: 51 Mär 22 08:30:29 BennyPi blemqtt.sh[31954]: Battery voltage: 2.921 Mär 22 08:30:29 BennyPi blemqtt.sh[31954]: Battery level: 82 Mär 22 08:30:29 BennyPi blemqtt.sh[31954]: 1 measurements collected. Exiting in a moment.

After some time the connection get stucks again.

I edited the bluetooth.service /etc/systemd/system/bluetooth.target.wants/bluetooth.service and try the points mentioned under 2.) and 3.).

Summerized:

I always have to restart the bluetooth service if the connection can't be established. The editing of bluetooth.service to start the bluetooth service in compatibillity mode don't helps. Also setting the watchdog time to 10s and restarting the bluetooth service on-failure don't leads to success.

Do you have any ideas to solve this problem?

Best regards and I hope you can help me ...

JsBergbau commented 4 years ago

Hi @BlueMorph, the watchdog in the script kills only its own bluepy-helper to work better with mulitple devices.

You wrote you call this script by cronjob like every 5 minutes. How do you quit the script? Do you use the -c / --count options?

I think sometimes bluepy-helper remains. I've rewritten the script to kill any own bluepy-helper when using the --count options. Please update to latest version (it is not in the release version so far) and report back.

BlueMorph commented 4 years ago

Hi @BlueMorph, the watchdog in the script kills only its own bluepy-helper to work better with mulitple devices.

You wrote you call this script by cronjob like every 5 minutes. How do you quit the script? Do you use the -c / --count options?

Yes I use the -c 1 option.

I think sometimes bluepy-helper remains. I've rewritten the script to kill any own bluepy-helper when using the --count options. Please update to latest version (it is not in the release version so far) and report back.

OK, I will try and give a report!

BlueMorph commented 4 years ago

Hi @JsBergbau,

so I do further tests with your new version of script. Without restarting the bluetooth-service from time to time the problem still remains.

Any ideas?

JsBergbau commented 4 years ago

Can you please execute apt list --installed | grep bluez and post the result here?

BlueMorph commented 4 years ago

Sure, thats my output.

WARNING: apt does not have a stable CLI interface. Use with caution in scripts.

bluez-firmware/testing,now 1.2-4+rpt2 all [Installiert,automatisch] bluez/testing,now 5.50-1+rpt1 armhf [Installiert,automatisch]

JsBergbau commented 4 years ago

That are the same versions I currently use and which look stable. Sorry don't have any explanation why you have these issues.

BlueMorph commented 4 years ago

I try using a workaround by restarting the bluetooth service every 10 minutes (I do this in a cronjob). But sometimes the bluetooth part crashes again. I think the problem is that I hardly restart the service without checking if there is an ongoing connection/transmission.

My dmesg shows me following information

[ 5558.350034] Bluetooth: hci0: Frame reassembly failed (-84) [ 5560.408735] Bluetooth: hci0: command 0x200c tx timeout [ 6627.928768] Bluetooth: hci0: command 0x200c tx timeout [ 6630.088726] Bluetooth: hci0: command 0x200c tx timeout [ 6630.089374] Bluetooth: hci0: Frame reassembly failed (-84) [ 6632.168730] Bluetooth: hci0: command 0x200b tx timeout [ 8547.789991] Bluetooth: hci0: Frame reassembly failed (-84) [ 8549.848745] Bluetooth: hci0: command 0x200c tx timeout [18336.597555] Bluetooth: hci0: Frame reassembly failed (-84) [18338.675079] Bluetooth: hci0: command 0x200c tx timeout [19088.616911] Bluetooth: hci0: Frame reassembly failed (-84) [19090.675093] Bluetooth: hci0: command 0x200c tx timeout [21816.995112] Bluetooth: hci0: command 0x200e tx timeout [21819.075118] Bluetooth: hci0: command 0x200b tx timeout [21819.075770] Bluetooth: hci0: Frame reassembly failed (-84) [21821.155113] Bluetooth: hci0: command 0x200c tx timeout

Now the interresting part:

[21821.195149] Unable to handle kernel NULL pointer dereference at virtual address 00000012 [21821.195172] pgd = b3aece74 [21821.195185] [00000012] *pgd=00000000 [21821.195212] Internal error: Oops: 17 [#1] SMP ARM [21821.195513] Modules linked in: cmac aes_arm_bs crypto_simd cryptd algif_skcipher af_alg bnep hci_uart btbcm serdev bluetooth ecdh_generic brcmfmac brcmutil sha256_generic cfg80211 rfkill raspberrypi_hwmon hwmon vc_sm_cma(C) uio_pdrv_genirq uio fixed ip_tables x_tables ipv6 [21821.196911] CPU: 3 PID: 575 Comm: kworker/u9:1 Tainted: G C 4.19.97-v7+ #1294 [21821.204965] Hardware name: BCM2835 [21821.213911] Workqueue: hci0 hci_rx_work [bluetooth] [21821.222426] PC is at hci_connect_le_scan_cleanup+0x14/0x124 [bluetooth] [21821.230834] LR is at create_le_conn_complete+0xd8/0xdc [bluetooth] [21821.239245] pc : [<7f42173c>] lr : [<7f423d4c>] psr: 60000013 [21821.247145] sp : b20b7de8 ip : b20b7e08 fp : b20b7e04 [21821.254884] r10: 00000006 r9 : b2030704 r8 : 0000000e [21821.262443] r7 : b2030008 r6 : 00000000 r5 : b2030000 r4 : b20308a4 [21821.270023] r3 : 7f423c74 r2 : 00000000 r1 : 00000000 r0 : 00000000 [21821.277514] Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment user [21821.285076] Control: 10c5383d Table: 376b006a DAC: 00000055 [21821.292626] Process kworker/u9:1 (pid: 575, stack limit = 0xf4874874) [21821.300238] Stack: (0xb20b7de8 to 0xb20b8000) [21821.307755] 7de0: b20308a4 b2030000 00000000 b2030008 b20b7e24 b20b7e08 [21821.323067] 7e00: 7f423d4c 7f421734 b2030000 ae7b00c0 80d04d48 aa502cc0 b20b7ec4 b20b7e28 [21821.338830] 7e20: 7f42b040 7f423c80 b20b7e5c b20b7e60 baf6b400 8085d710 b9eee900 00000000 [21821.355116] 7e40: b20b7e6c 80c92ad8 808635c4 00000000 b2030000 000307d8 b203200d 7f423c74 [21821.371962] 7e60: 00000000 b20b7e70 8020336c 80204570 40000013 b20307d8 b20307e4 b2030000 [21821.389580] 7e80: b20b7ea4 b20b7e90 808635c4 80203328 b20307d8 d6ab3eb8 b20b7ec4 b58ef600 [21821.407425] 7ea0: b2030700 ae7b00c0 b2030000 b20307d8 b2030704 00000006 b20b7efc b20b7ec8 [21821.425244] 7ec0: 7f41ce8c 7f42acf4 b2030704 00000006 b20b7efc b58ef600 b2030700 bab31600 [21821.443260] 7ee0: b5ad0e00 00000000 b2030704 00000006 b20b7f34 b20b7f00 8013c44c 7f41cd2c [21821.461922] 7f00: 40000093 bab31600 bab31600 bab31600 b58ef614 bab31600 bab31618 80d03d00 [21821.481065] 7f20: 00000088 b58ef600 b20b7f7c b20b7f38 8013c790 8013c2e8 b20b7f5c 00000000 [21821.500681] 7f40: 80d03d00 80d03d00 80d8ef7a b20b6038 b20b7f7c b76847c0 b5899500 00000000 [21821.521017] 7f60: b58ef600 8013c734 b76847dc bab73e74 b20b7fac b20b7f80 80142ac4 8013c740 [21821.542056] 7f80: 801043d8 b5899500 8014298c 00000000 00000000 00000000 00000000 00000000 [21821.563425] 7fa0: 00000000 b20b7fb0 801010ac 80142998 00000000 00000000 00000000 00000000 [21821.585139] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [21821.607040] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 00000000 00000000 [21821.629472] [<7f42173c>] (hci_connect_le_scan_cleanup [bluetooth]) from [<7f423d4c>] (create_le_conn_complete+0xd8/0xdc [bluetooth]) [21821.652079] [<7f423d4c>] (create_le_conn_complete [bluetooth]) from [<7f42b040>] (hci_event_packet+0x358/0x2eb4 [bluetooth]) [21821.674676] [<7f42b040>] (hci_event_packet [bluetooth]) from [<7f41ce8c>] (hci_rx_work+0x16c/0x234 [bluetooth]) [21821.697002] [<7f41ce8c>] (hci_rx_work [bluetooth]) from [<8013c44c>] (process_one_work+0x170/0x458) [21821.719066] [<8013c44c>] (process_one_work) from [<8013c790>] (worker_thread+0x5c/0x5a4) [21821.741089] [<8013c790>] (worker_thread) from [<80142ac4>] (kthread+0x138/0x168) [21821.752278] [<80142ac4>] (kthread) from [<801010ac>] (ret_from_fork+0x14/0x28) [21821.763337] Exception stack(0xb20b7fb0 to 0xb20b7ff8) [21821.774226] 7fa0: 00000000 00000000 00000000 00000000 [21821.795665] 7fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [21821.816964] 7fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [21821.827701] Code: e92dd8f0 e24cb004 e52de004 e8bd4000 (e5d05012) [21821.838490] ---[ end trace f9a6c3169b8824f0 ]---

After this stacktrace the bluetooth log in the journal looks like this:

Apr 04 12:30:01 BennyPi bluetoothd[8801]: Terminating Apr 04 12:30:01 BennyPi systemd[1]: Stopping Bluetooth service... Apr 04 12:31:32 BennyPi systemd[1]: bluetooth.service: State 'stop-sigterm' timed out. Killing. Apr 04 12:31:32 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:33:02 BennyPi systemd[1]: bluetooth.service: Processes still around after SIGKILL. Ignoring. Apr 04 12:34:32 BennyPi systemd[1]: bluetooth.service: State 'stop-final-sigterm' timed out. Killing. Apr 04 12:34:32 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:36:02 BennyPi systemd[1]: bluetooth.service: Processes still around after final SIGKILL. Entering failed mode. Apr 04 12:36:02 BennyPi systemd[1]: bluetooth.service: Failed with result 'timeout'. Apr 04 12:36:02 BennyPi systemd[1]: Stopped Bluetooth service. Apr 04 12:36:02 BennyPi systemd[1]: bluetooth.service: Found left-over process 8801 (bluetoothd) in control group while starting unit. Ignoring. Apr 04 12:36:02 BennyPi systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Apr 04 12:36:02 BennyPi systemd[1]: Starting Bluetooth service... Apr 04 12:36:02 BennyPi bluetoothd[9219]: Bluetooth daemon 5.50 Apr 04 12:36:02 BennyPi bluetoothd[9219]: D-Bus setup failed: Name already in use Apr 04 12:36:02 BennyPi bluetoothd[9219]: Unable to get on D-Bus Apr 04 12:36:03 BennyPi systemd[1]: bluetooth.service: Main process exited, code=exited, status=1/FAILURE Apr 04 12:37:33 BennyPi systemd[1]: bluetooth.service: State 'stop-sigterm' timed out. Killing. Apr 04 12:37:33 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:39:03 BennyPi systemd[1]: bluetooth.service: Processes still around after SIGKILL. Ignoring. Apr 04 12:40:33 BennyPi systemd[1]: bluetooth.service: State 'stop-final-sigterm' timed out. Killing. Apr 04 12:40:33 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:42:04 BennyPi systemd[1]: bluetooth.service: Processes still around after final SIGKILL. Entering failed mode. Apr 04 12:42:04 BennyPi systemd[1]: bluetooth.service: Failed with result 'exit-code'. Apr 04 12:42:04 BennyPi systemd[1]: Stopped Bluetooth service. Apr 04 12:42:04 BennyPi systemd[1]: bluetooth.service: Found left-over process 8801 (bluetoothd) in control group while starting unit. Ignoring. Apr 04 12:42:04 BennyPi systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Apr 04 12:42:04 BennyPi systemd[1]: Starting Bluetooth service... Apr 04 12:42:04 BennyPi bluetoothd[9422]: Bluetooth daemon 5.50 Apr 04 12:42:04 BennyPi bluetoothd[9422]: D-Bus setup failed: Name already in use Apr 04 12:42:04 BennyPi bluetoothd[9422]: Unable to get on D-Bus Apr 04 12:42:04 BennyPi systemd[1]: bluetooth.service: Main process exited, code=exited, status=1/FAILURE Apr 04 12:43:34 BennyPi systemd[1]: bluetooth.service: State 'stop-sigterm' timed out. Killing. Apr 04 12:43:34 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:45:04 BennyPi systemd[1]: bluetooth.service: Processes still around after SIGKILL. Ignoring. Apr 04 12:46:34 BennyPi systemd[1]: bluetooth.service: State 'stop-final-sigterm' timed out. Killing. Apr 04 12:46:34 BennyPi systemd[1]: bluetooth.service: Killing process 8801 (bluetoothd) with signal SIGKILL. Apr 04 12:48:04 BennyPi systemd[1]: bluetooth.service: Processes still around after final SIGKILL. Entering failed mode. Apr 04 12:48:04 BennyPi systemd[1]: bluetooth.service: Failed with result 'exit-code'. Apr 04 12:48:04 BennyPi systemd[1]: Failed to start Bluetooth service. Apr 04 12:48:05 BennyPi systemd[1]: bluetooth.service: Service RestartSec=100ms expired, scheduling restart. Apr 04 12:48:05 BennyPi systemd[1]: bluetooth.service: Scheduled restart job, restart counter is at 1. Apr 04 12:48:05 BennyPi systemd[1]: Stopped Bluetooth service. Apr 04 12:48:05 BennyPi systemd[1]: bluetooth.service: Found left-over process 8801 (bluetoothd) in control group while starting unit. Ignoring. Apr 04 12:48:05 BennyPi systemd[1]: This usually indicates unclean termination of a previous run, or service implementation deficiencies. Apr 04 12:48:05 BennyPi systemd[1]: Starting Bluetooth service... Apr 04 12:48:05 BennyPi bluetoothd[9573]: Bluetooth daemon 5.50 Apr 04 12:48:05 BennyPi bluetoothd[9573]: D-Bus setup failed: Name already in use Apr 04 12:48:05 BennyPi bluetoothd[9573]: Unable to get on D-Bus

The journal of the script (its integrated in a blemqtt-service) looks following:

Mär 29 04:05:37 BennyPi sh[8768]: Trying to connect to A4:C1:38:82:2D:E3 Mär 29 04:05:46 BennyPi sh[8768]: Temperature: 17.2 Mär 29 04:05:46 BennyPi sh[8768]: Humidity: 47 Mär 29 04:05:46 BennyPi sh[8768]: Battery voltage: 2.985 Mär 29 04:05:46 BennyPi sh[8768]: Battery level: 88 Mär 29 04:05:46 BennyPi sh[8768]: 1 measurements collected. Exiting in a moment. Mär 29 04:05:51 BennyPi sh[8768]: Topic blemqtt/A4C138822DE3 Mär 29 04:05:51 BennyPi sh[8768]: measurements deque([Measurement(temperature=17.2, humidity=47, voltage=2.985, calibratedHumidity=0, battery=88, timestamp=1585447546)]) Mär 29 04:05:51 BennyPi sh[8768]: 17.2 Mär 29 04:05:51 BennyPi sh[8768]: Publishing {"temperature": "17.2", "humidity": "47", "batt": "88", "voltage": "2.985"} Mär 29 04:07:51 BennyPi sh[8768]: Connection lost Mär 29 04:07:52 BennyPi sh[8768]: Waiting... Mär 29 04:07:52 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:32 BennyPi sh[8768]: Connection lost Mär 29 04:08:33 BennyPi sh[8768]: Waiting... Mär 29 04:08:33 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:34 BennyPi sh[8768]: Connection lost Mär 29 04:08:35 BennyPi sh[8768]: Waiting... Mär 29 04:08:35 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:35 BennyPi sh[8768]: Connection lost Mär 29 04:08:36 BennyPi sh[8768]: Waiting... Mär 29 04:08:36 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:36 BennyPi sh[8768]: Connection lost Mär 29 04:08:37 BennyPi sh[8768]: Waiting... Mär 29 04:08:37 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:37 BennyPi sh[8768]: Connection lost Mär 29 04:08:38 BennyPi sh[8768]: Waiting... Mär 29 04:08:38 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:38 BennyPi sh[8768]: Connection lost Mär 29 04:08:39 BennyPi sh[8768]: Waiting... Mär 29 04:08:39 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:39 BennyPi sh[8768]: Connection lost Mär 29 04:08:40 BennyPi sh[8768]: Waiting... Mär 29 04:08:40 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:40 BennyPi sh[8768]: Connection lost Mär 29 04:08:41 BennyPi sh[8768]: Waiting... Mär 29 04:08:41 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:42 BennyPi sh[8768]: Connection lost Mär 29 04:08:43 BennyPi sh[8768]: Waiting... Mär 29 04:08:43 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:43 BennyPi sh[8768]: Connection lost Mär 29 04:08:44 BennyPi sh[8768]: Waiting... Mär 29 04:08:44 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:44 BennyPi sh[8768]: Connection lost Mär 29 04:08:45 BennyPi sh[8768]: Waiting... Mär 29 04:08:45 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:45 BennyPi sh[8768]: Connection lost Mär 29 04:08:46 BennyPi sh[8768]: Waiting... Mär 29 04:08:46 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:46 BennyPi sh[8768]: Connection lost Mär 29 04:08:47 BennyPi sh[8768]: Waiting... Mär 29 04:08:47 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:47 BennyPi sh[8768]: Connection lost Mär 29 04:08:48 BennyPi sh[8768]: Waiting... Mär 29 04:08:48 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:48 BennyPi sh[8768]: Connection lost Mär 29 04:08:49 BennyPi sh[8768]: Waiting... Mär 29 04:08:49 BennyPi sh[8768]: Trying to connect to A4:C1:38:46:B7:2D Mär 29 04:08:49 BennyPi sh[8768]: Connection lost

... and this never ends ...

JsBergbau commented 4 years ago

Very strange.

dmesg | grep -i bluetooth
[   29.886252] Bluetooth: Core ver 2.22
[   29.886449] Bluetooth: HCI device and connection manager initialized
[   29.897497] Bluetooth: HCI socket layer initialized
[   29.897521] Bluetooth: L2CAP socket layer initialized
[   29.897599] Bluetooth: SCO socket layer initialized
[   30.007591] Bluetooth: HCI UART driver ver 2.3
[   30.007609] Bluetooth: HCI UART protocol H4 registered
[   30.038723] Bluetooth: HCI UART protocol Three-wire (H5) registered
[   30.039036] Bluetooth: HCI UART protocol Broadcom registered
[   32.493797] Bluetooth: BNEP (Ethernet Emulation) ver 1.3
[   32.493807] Bluetooth: BNEP filters: protocol multicast
[   32.493835] Bluetooth: BNEP socket layer initialized

From a Pi Zero " running now for 9 days.

Perhaps it is some kind of hardware problem with your PI? I suggest posting your previous message in the official raspberry PI forum https://www.raspberrypi.org/forums/viewforum.php?f=28

Please inform us about the results.

JsBergbau commented 4 years ago

Seems to be resolved --> Closed