flyte / mqtt-io

Expose GPIO modules (Raspberry Pi, Beaglebone, PCF8754, PiFace2 etc.) and digital sensors (LM75 etc.) to an MQTT server for remote control and monitoring.
MIT License
468 stars 158 forks source link

Hard lockup on Raspberry Pi 4 #242

Open enp6s0 opened 2 years ago

enp6s0 commented 2 years ago

Describe the bug mqtt-io non-deterministically causing a hard lockup on the Raspberry Pi 4B

When mqtt-io is started (either directly through the console or through systemd), there is a chance that the entire Pi will just lock up (it won't even respond to network pings, nor local console inputs). On the occasion that it works, there is a syslog message:

Disabling IRQ #50
Disabling IRQ #51

dmesg seems to say:

RaspberryPi kernel: [  652.059493] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.10.63-v8+ #1459
RaspberryPi kernel: [  652.059496] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
RaspberryPi kernel: [  652.059499] Call trace:
RaspberryPi kernel: [  652.059514]  dump_backtrace+0x0/0x1b8
RaspberryPi kernel: [  652.059519]  show_stack+0x20/0x30
RaspberryPi kernel: [  652.059525]  dump_stack+0xf0/0x158
RaspberryPi kernel: [  652.059528]  __report_bad_irq+0x54/0xe4
RaspberryPi kernel: [  652.059534]  note_interrupt+0x290/0x390
RaspberryPi kernel: [  652.059538]  handle_irq_event_percpu+0x90/0x98
RaspberryPi kernel: [  652.059543]  handle_irq_event+0x50/0xf8
RaspberryPi kernel: [  652.059547]  handle_edge_irq+0xe0/0x240
RaspberryPi kernel: [  652.059550]  generic_handle_irq+0x38/0x50
RaspberryPi kernel: [  652.059556]  bcm2835_gpio_irq_handle_bank+0xb4/0xf8
RaspberryPi kernel: [  652.059560]  bcm2835_gpio_irq_handler+0x94/0x150
RaspberryPi kernel: [  652.059563]  generic_handle_irq+0x38/0x50
RaspberryPi kernel: [  652.059566]  __handle_domain_irq+0x9c/0x110
RaspberryPi kernel: [  652.059570]  gic_handle_irq+0xb0/0xf0
RaspberryPi kernel: [  652.059573]  el1_irq+0xcc/0x180
RaspberryPi kernel: [  652.059576]  __do_softirq+0x10c/0x510
RaspberryPi kernel: [  652.059581]  irq_exit+0xe8/0x108
RaspberryPi kernel: [  652.059585]  __handle_domain_irq+0xa0/0x110
RaspberryPi kernel: [  652.059588]  gic_handle_irq+0xb0/0xf0
RaspberryPi kernel: [  652.059591]  el1_irq+0xcc/0x180
RaspberryPi kernel: [  652.059595]  arch_cpu_idle+0x18/0x28
RaspberryPi kernel: [  652.059599]  default_idle_call+0x58/0x1d4
RaspberryPi kernel: [  652.059605]  do_idle+0x25c/0x270
RaspberryPi kernel: [  652.059609]  cpu_startup_entry+0x2c/0x70
RaspberryPi kernel: [  652.059612]  rest_init+0xd8/0xe8
RaspberryPi kernel: [  652.059618]  arch_call_rest_init+0x18/0x24
RaspberryPi kernel: [  652.059621]  start_kernel+0x53c/0x570
RaspberryPi kernel: [  652.491726] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G         C        5.10.63-v8+ #1459
RaspberryPi kernel: [  652.491729] Hardware name: Raspberry Pi 4 Model B Rev 1.2 (DT)
RaspberryPi kernel: [  652.491732] Call trace:
RaspberryPi kernel: [  652.491749]  dump_backtrace+0x0/0x1b8
RaspberryPi kernel: [  652.491754]  show_stack+0x20/0x30
RaspberryPi kernel: [  652.491760]  dump_stack+0xf0/0x158
RaspberryPi kernel: [  652.491763]  __report_bad_irq+0x54/0xe4
RaspberryPi kernel: [  652.491769]  note_interrupt+0x290/0x390
RaspberryPi kernel: [  652.491773]  handle_irq_event_percpu+0x90/0x98
RaspberryPi kernel: [  652.491777]  handle_irq_event+0x50/0xf8
RaspberryPi kernel: [  652.491781]  handle_edge_irq+0xe0/0x240
RaspberryPi kernel: [  652.491784]  generic_handle_irq+0x38/0x50
RaspberryPi kernel: [  652.491791]  bcm2835_gpio_irq_handle_bank+0xb4/0xf8
RaspberryPi kernel: [  652.491794]  bcm2835_gpio_irq_handler+0x94/0x150
RaspberryPi kernel: [  652.491798]  generic_handle_irq+0x38/0x50
RaspberryPi kernel: [  652.491801]  __handle_domain_irq+0x9c/0x110
RaspberryPi kernel: [  652.491804]  gic_handle_irq+0xb0/0xf0
RaspberryPi kernel: [  652.491807]  el1_irq+0xcc/0x180
RaspberryPi kernel: [  652.491811]  __do_softirq+0x10c/0x510
RaspberryPi kernel: [  652.491815]  irq_exit+0xe8/0x108
RaspberryPi kernel: [  652.491819]  __handle_domain_irq+0xa0/0x110
RaspberryPi kernel: [  652.491822]  gic_handle_irq+0xb0/0xf0
RaspberryPi kernel: [  652.491825]  el1_irq+0xcc/0x180
RaspberryPi kernel: [  652.491832]  arch_cpu_idle+0x18/0x28
RaspberryPi kernel: [  652.491837]  default_idle_call+0x58/0x1d4
RaspberryPi kernel: [  652.491843]  do_idle+0x25c/0x270
RaspberryPi kernel: [  652.491847]  cpu_startup_entry+0x2c/0x70
RaspberryPi kernel: [  652.491850]  rest_init+0xd8/0xe8
RaspberryPi kernel: [  652.491856]  arch_call_rest_init+0x18/0x24
RaspberryPi kernel: [  652.491859]  start_kernel+0x53c/0x570

But then mqtt-io would start normally and everything works as expected. However, more often than not, nothing happens, and then the Pi would just completely lock up. No error message, no nothing from mqtt-io or Python. The local console seem to complain about the system locking up so hard the filesystem is breaking. At this stage, nothing can be done except for a cold reset:

  397.194090] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1536: inode #36268: comm systemd-journal: reading directory lblock 0
[  397.207052] EXT4-fs error (device mmcblk0p2): __ext4_find_entry:1536: inode #36268: comm systemd-journal: reading directory lblock 0

Observing htop results before lockup (by starting the service then immediately running htop), the last thing I observed before the crash was that the Python process that's running mqtt_io uses 100% CPU on one core. Not sure if this is important info or not, but I'll just include it just in case. Observing the syslog before lockup does not seem to reveal anything weird.

I'm running this on a Raspberry Pi 4B (2GB), but would like to use the gpiod driver so that the configuration will (hopefully?) be more portable to another OS that is not Raspbian.

Expected behavior mqtt-io should be able to run without locking the Pi up

Config

# MQTT
mqtt:
  host: localhost
  topic_prefix: rpitest/gpio

# GPIO
gpio_modules:
  - name: gpio
    module: gpiod

# Inputs
digital_inputs:
  - name: gpio5
    module: gpio
    pin: 5
    interrupt: both
  - name: gpio6
    module: gpio
    pin: 6
    interrupt: both
  - name: gpio19
    module: gpio
    pin: 19
    interrupt: both
  - name: gpio26
    module: gpio
    pin: 26
    interrupt: both
  - name: gpio23
    module: gpio
    pin: 23
    interrupt: both
  - name: gpio24
    module: gpio
    pin: 24
    interrupt: both

# Outputs
digital_outputs:
  - name: gpio17
    module: gpio
    pin: 17
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio27
    module: gpio
    pin: 27
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio22
    module: gpio
    pin: 22
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio10
    module: gpio
    pin: 10
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio9
    module: gpio
    pin: 9
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio11
    module: gpio
    pin: 11
    initial: high
    publish_initial: true
    inverted: true
  - name: gpio4
    module: gpio
    pin: 4
    initial: low
    publish_initial: true
  - name: gpio18
    module: gpio
    pin: 18
    initial: low
    publish_initial: true
  - name: gpio21
    module: gpio
    pin: 21
    initial: low
    publish_initial: true
  - name: gpio25
    module: gpio
    pin: 25
    initial: low
    publish_initial: true
  - name: gpio8
    module: gpio
    pin: 8
    initial: low
    publish_initial: true
  - name: wiegand_mcu_reset
    module: gpio7
    pin: 7
    initial: high
    publish_initial: true
  - name: gpio20
    module: gpio
    pin: 20
    initial: low
    publish_initial: true

Hardware

System:

SystemD config

[Unit]
Description=GPIO MQTT bridge service
After=multi-user.target

[Service]
Type=simple
Restart=always
RestartSec=3
User=root
Group=gpio
ExecStart=/usr/bin/python3 -m mqtt_io /etc/gpio.yml

[Install]
WantedBy=multi-user.target

Raspberry Pi config.txt

arm_64bit=1
enable_uart=1
gpu_mem=32

dtparam=spi=off
dtoverlay=disable-bt
dtoverlay=disable-wifi

dtoverlay=uart2
dtoverlay=uart5

dtparam=i2c_arm=on
dtparam=i2c_vc=on

dtoverlay=i2c-rtc,ds1307

dtoverlay=dwc2,dr_mode=host

Would appreciate some insights into this problem. Thanks!

Taxelas commented 2 years ago

Hello, I have the same issue but it appeared that only wired connection hangs. If wi-fi was connected the wireless connection stays alive. Also I used uart0 for troubleshooting. In the logs I see

Oct 08 13:51:51 rpi dhcpcd[606]: eth0: carrier lost
Oct 08 13:51:51 rpi kernel: ------------[ cut here ]------------
Oct 08 13:51:51 rpi kernel: WARNING: CPU: 0 PID: 33 at drivers/net/phy/phy.c:943 phy_error+0x34/0x6c
Oct 08 13:51:51 rpi kernel: Modules linked in: xt_nat xt_tcpudp veth rfcomm xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_counter xt_addrtype nft_compat nf_tables nfnetlink br_netfilter bridge cmac algif_hash aes_arm_bs crypto_simd cryptd algif_skcipher af_alg bnep hci_uart btbcm bluetooth ecdh_generic ecc overlay 8021q garp stp llc vc4 spidev brcmfmac snd_soc_hdmi_codec brcmutil cec binfmt_misc drm_kms_helper cfg80211 cdc_acm bcm2835_isp(C) ftdi_sio bcm2835_codec(C) bcm2835_v4l2(C) rpivid_hevc(C) usbserial bcm2835_mmal_vchiq(C) v4l2_mem2mem videobuf2_dma_contig videobuf2_vmalloc videobuf2_memops snd_soc_core videobuf2_v4l2 rfkill videobuf2_common snd_bcm2835(C) raspberrypi_hwmon videodev snd_compress snd_pcm_dmaengine snd_pcm snd_timer v3d i2c_bcm2835 gpu_sched i2c_brcmstb snd vc_sm_cma(C) spi_bcm2835 syscopyarea sysfillrect sysimgblt fb_sys_fops mc uio_pdrv_genirq nvmem_rmem uio drm i2c_dev fuse drm_panel_orientation_quirks
Oct 08 13:51:51 rpi kernel:  backlight ip_tables x_tables ipv6
Oct 08 13:51:51 rpi kernel: CPU: 0 PID: 33 Comm: kworker/0:1 Tainted: G         C        5.15.68-v7l+ #1586
Oct 08 13:51:51 rpi kernel: Hardware name: BCM2711
Oct 08 13:51:51 rpi kernel: Workqueue: events_power_efficient phy_state_machine
Oct 08 13:51:51 rpi kernel: Backtrace: 
Oct 08 13:51:51 rpi kernel: [<c0bd5d9c>] (dump_backtrace) from [<c0bd5fe8>] (show_stack+0x20/0x24)
Oct 08 13:51:51 rpi kernel:  r7:000003af r6:c0e3f418 r5:00000000 r4:60000013
Oct 08 13:51:51 rpi kernel: [<c0bd5fc8>] (show_stack) from [<c0bda6f8>] (dump_stack_lvl+0x70/0x94)
Oct 08 13:51:51 rpi kernel: [<c0bda688>] (dump_stack_lvl) from [<c0bda734>] (dump_stack+0x18/0x1c)
Oct 08 13:51:51 rpi kernel:  r7:000003af r6:00000009 r5:c08eb2e4 r4:c0e9a62c
Oct 08 13:51:51 rpi kernel: [<c0bda71c>] (dump_stack) from [<c02226c0>] (__warn+0xfc/0x114)
Oct 08 13:51:51 rpi kernel: [<c02225c4>] (__warn) from [<c0bd66a8>] (warn_slowpath_fmt+0x70/0xd8)
Oct 08 13:51:51 rpi kernel:  r7:000003af r6:c0e9a62c r5:c1205048 r4:00000000
Oct 08 13:51:51 rpi kernel: [<c0bd663c>] (warn_slowpath_fmt) from [<c08eb2e4>] (phy_error+0x34/0x6c)
Oct 08 13:51:51 rpi kernel:  r9:fffffffb r8:c228d400 r7:00000004 r6:c228d738 r5:c228d738 r4:c228d400
Oct 08 13:51:51 rpi kernel: [<c08eb2b0>] (phy_error) from [<c08ec8cc>] (phy_state_machine+0xdc/0x210)
Oct 08 13:51:51 rpi kernel:  r5:c1205048 r4:c228d70c
Oct 08 13:51:51 rpi kernel: [<c08ec7f0>] (phy_state_machine) from [<c0240108>] (process_one_work+0x250/0x57c)
Oct 08 13:51:51 rpi kernel:  r9:00000000 r8:efefd400 r7:00000000 r6:efef9bc0 r5:c1662480 r4:c228d70c
Oct 08 13:51:51 rpi kernel: [<c023feb8>] (process_one_work) from [<c0240494>] (worker_thread+0x60/0x5c4)
Oct 08 13:51:51 rpi kernel:  r10:efef9bc0 r9:c1203d00 r8:efef9bd8 r7:00000008 r6:efef9bc0 r5:c1662498
Oct 08 13:51:51 rpi kernel:  r4:c1662480
Oct 08 13:51:51 rpi kernel: [<c0240434>] (worker_thread) from [<c02487b4>] (kthread+0x178/0x194)
Oct 08 13:51:51 rpi kernel:  r10:c167e000 r9:c1547e74 r8:00000000 r7:c1662480 r6:c0240434 r5:c166c400
Oct 08 13:51:51 rpi kernel:  r4:c166e080
Oct 08 13:51:51 rpi kernel: [<c024863c>] (kthread) from [<c02000d4>] (ret_from_fork+0x14/0x20)
Oct 08 13:51:51 rpi kernel: Exception stack(0xc167ffb0 to 0xc167fff8)
Oct 08 13:51:51 rpi kernel: ffa0:                                     00000000 00000000 00000000 00000000
Oct 08 13:51:51 rpi kernel: ffc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
Oct 08 13:51:51 rpi kernel: ffe0: 00000000 00000000 00000000 00000000 00000013 00000000
Oct 08 13:51:51 rpi kernel:  r10:00000000 r9:00000000 r8:00000000 r7:00000000 r6:00000000 r5:c024863c
Oct 08 13:51:51 rpi kernel:  r4:c166c400
Oct 08 13:51:51 rpi kernel: ---[ end trace 89c6ec1ab50f7e96 ]---
Oct 08 13:51:51 rpi kernel: bcmgenet fd580000.ethernet eth0: Link is Down
Oct 08 13:51:51 rpi dhcpcd[606]: eth0: deleting address fe80::b1ed:6aa3:bfa9:a171
Oct 08 13:51:51 rpi avahi-daemon[428]: Withdrawing address record for fe80::b1ed:6aa3:bfa9:a171 on eth0.
Oct 08 13:51:51 rpi avahi-daemon[428]: Leaving mDNS multicast group on interface eth0.IPv6 with address fe80::b1ed:6aa3:bfa9:a171.
Oct 08 13:51:51 rpi avahi-daemon[428]: Interface eth0.IPv6 no longer relevant for mDNS.
Oct 08 13:51:51 rpi avahi-daemon[428]: Withdrawing address record for 192.168.11.67 on eth0.
Oct 08 13:51:51 rpi avahi-daemon[428]: Leaving mDNS multicast group on interface eth0.IPv4 with address 192.168.11.67.
Oct 08 13:51:51 rpi avahi-daemon[428]: Interface eth0.IPv4 no longer relevant for mDNS.

eth0 looks like

2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group0
    link/ether dc:a6:32:aa:d1:ab brd ff:ff:ff:ff:ff:ff

If I shutdown the script I can bring eth0 up with sudo ifup eth0 But it doesn't restore connectivity, only reboot helps.

I have few docker containers containers running + python scripts that read mcp9808 data from i2c bus And issue repeats on the same platform without sensors connected.

Update: tried on the fresh install of raspbian and I get the same issue. I have 2xRPI4 B 4GB On RPI3 it works without problems

Taxelas commented 2 years ago

The cause of my issue was wrong mqtt password, after fixing typo in the config the module worked. Still it is totally strange that it caused system to crash, but it works now. Update: After trying to add more pins to config I bumped into the same issue, when eth0 goes hard down and doesn't come up until reboot. It happens if the pin is not correct, I also have feeling if this happens when other process tries to access the pin that is already in use. BTW the "pin" in the config is actually referring to GPIO number and is different from physical pin number.

It is really hard to troubleshoot in headless mode, because it cuts down communication via eth0. The good thing is if wi-fi was connected it doesn't go down. In my case I used another rpi to connect it's UART to serial console of RPI with mqtt-io and get direct console access.