Closed LouneCode closed 9 months ago
Another hint
Also sx1302_hal/tree/master/packet_forwarder/src
/jitqueue.c has a qsort_r
() functionality, similar to loragw_hal.c. If there was a special reason to optimize the old sorting method with qsort() . . .
Thank you so much @LouneCode for debugging this issue in so much detail! The reason why this was changed is because qsort_r
is not available in musl. I took these changes from an other patch, but it seems I took only part of the changes.
I will fix this with the proposed changes. If you would like, I'm also open for a pull-request.
Good that things are progressing....;)
Please, open the pull-request.
FYI now after that fix above, the chirpstack-consentratord-sx1302 process rock like a charm. No more segmentation fault errors.
You have done a good job. Thanks.
Let's do IoT better
@LouneCode this commit contains the proposed changes: https://github.com/brocaar/sx1302_hal/commit/c3d99009556fdfe273c3a53306082ef181333c7a. Could you do a final check to make sure this is what you were proposing?
Thanks for the quick response. Sure, I check it out...
Ok, I have walked thru commits on both repositories. Changes are correct.
I also compiled the latest version of the code of this repository (make build) and did a test run on the chirpstack-concentratord-sx1302 process: Everything seems fine
.
Let's do IoT better
Thanks again for your help @LouneCode :+1:
Chripstack-concentratord-sx1302 v4.3.0 crashed with segmentation fault when using fine timestamping.
Test environment
The Chripstack-concentratord-sx1302 module is compiled according to the instructions in this repository.
Concentratord crashed randomly ...
Consentratord crashed randomly with SIGSEGV signal Segmentation fault. It seems that segmentation fault occurs (only?) if the Fine Stamping is enabled.
Fine timestamping is enabled in a concentratord configuration file as follows In the test case:
Concentratord start command:
./chirpstack-concentratord-sx1302 -c sx1303.toml
Concentratord runs for a random amount of the time and crashes with a segmentation fault error. The log does not show the reason or any information about the crash. Only hint is that segmentation fault has been raised. Not good.
I've investigated and debugged chirpstack-consentratord-sx1302 code and found that the problem lies behind the Rust FFI in the customized c code of sx1302_hal layer.
After a long run I got a full back trace log with the gdb debugger.
Here is a simplified call stack and now we can see where the problem lies.
Concentratord Segmentation fault caused by compare_pkt_tmst() c code function in customized sx1302_hal module. This module was changed last in 9aef4ac commit which updates qsort() functionality in merge_packets() function (libloragw/src/loragw_hal.c).
I think that after 9aef4ac commit the comparator function compare_pkt_tmst() of qsort not work as it should be and cause the segmentation fault (at least in this case). Error can due to wrong argument list in comparator function compare_pkt_tmst().
Segmentation fault happens when LoRa radio receives many packets in a same time. In code, the information of received packets are sorted and duplicate packets are removed.
Issue can be fixed changing code of loragw_hal.c:
Before fix:
After fix:
Let's do IoT better