Closed nicolas-juteau closed 7 months ago
Adjacent: Do you have the option to use mPower firmware on that model? I have not run into anything similar on my MTCAP models.
https://www.multitech.net/developer/downloads/
It looks like all the tested models are using mPower (which is also the recommended install when I've spoken with MultiTech in the past):
https://www.chirpstack.io/docs/chirpstack-mqtt-forwarder/install/multitech.html
As far as I know, mPower is just the Multitech Application Enablement Platform (AEP) that sits on top of mLinux. I'd expect the kernel alignment trap issue to occur as well on mPower (with same mLinux version). I might give it a try to confirm this.
Also, we are using a custom built mlinux image (that's why we don't use mPower). Might try with a fresh standalone mLinux as well to confirm this.
I have not encountered this issue on MTCAP/MTCAP3 as well, though we've not exprimented that much with those models so we can't state this will never occur in the future. It is encouraging to learn you have never witnessed this issue on MTCAPs
I confirm this happens with latest version of mPower as well (6.3.1)
Handling message error, error: Operation cannot be accomplished in current state
This is a ZeroMQ error, so it seems that the Concentratord <> MQTT Forwarder communication somehow breaks.
I'm looking into this right now. I have a Multitech Conduit which I'm now upgrading to 6.3.1 after which I will try to reproduce the issue.
I can reproduce the error:
kernel: Alignment trap: chirpstack-mqtt (5772) PC=0x001f1ee8 Instr=0xe79a7009 Address=0xb6a42303 FSR 0x001
I have tried with the latest master
version, and I no longer see the error.
The latest master
version replaces paho-mqtt
by rumqttc
. The first links against the Paho C library, which also depends on OpenSSL. rumqttc
is a pure-Rust MQTT client, for TLS it depends on rustls. I suspect that the alignment trap issue was caused by the cross-compiling and linking to C code.
I will create a MQTT Forwarder v4.2.0-test.1 build for you today for testing.
Could you try again with this build:
It is the latest master
version built for the Conduit (you need to unzip first, as GitHub doesn't allow .ipk
uploads in comments). I have also added this change: https://github.com/chirpstack/chirpstack-mqtt-forwarder/commit/95439f1edb7c07fdfac1afc6012a1e37bfdd2d54. In case there is a ZeroMQ error on the REQ socket, it will automatically reconnect as after an error the socket might be in a 'dirty' state.
The build you provided seems to resolve the kernel panic issue. I'll let the gateway run for a while and see if the issue ever occurs again. If so, i'll open a new ticket and refer to this one.
I'm able to reproduce after upgrading my Conduit AP to 6.3.1, would be happy to see the 4.2.0-test.1 build for that device.
Hi @bconway , please find attached the test build for the Conduit AP.
I was hoping to create an "official" test build, but during the refactor from Paho (with OpenSSL dependency) to Rumqttc (with Rustls dependency, based on ring) I missed that ring
does not yet support mips-
targets. There was a MIPS related pull-request which got merged, but this was only for mipsel-
targets (little endian). There is a new PR for mips-
targets which hopefully will get merged soon.
Great, thanks! With the change in MQTT client, are there any configuration changes under [mqtt]
, or was everything existing still supported? Thanks.
The MQTT config options remained the same :)
As there hasn't been any activity on this issue, I assume the proposed fix that is part of the latest version(s) did solve the issue. Please let me know if this was not the case.
What happened?
Using Multitech Conduit (MTCDT) gateway:
At some point during normal operation, the mqtt-forwarder will stop processing downlink correctly to end-devices (i.e: end-devices will no longer receive any downlink). However, uplinks from end-devices that already joined the network will continue to work as expected until they issue another join request, in which case they will never be able the join back since downlink stopped working.
From there, for each downlink received by the mqtt-forwarder, the following can be seen in logs (log truncated to keep only relevant messages, see Log Output section for full log):
The MQTT forwarder will remain in this state, until the process is manually restarted.
NOTE: i've noticed this behavior only with the MTCDT (Multitech Conduit).
What did you expect
I'm expecting this error not to occur and downlinks continue to be processed correctly as it should.
Steps to reproduce this issue
Unfortunately, I haven't been able to systematically reproduce this issue all the time. It happens from time to time, randomly. Since uplinks continue to work as it should, it becomes only noticeable when we manually issue a downlink to a end-device and realizes that the downlink is not received by the end-device itself.
Log output
In this log transcript, two events can be seen and chances are they might something have to do with the current issue:
Environment
EDIT: added custom image detail to Gateway OS