Rodemfr / MicronetToNMEA

A NMEA 0183 converter for Raymarine's wireless instruments ... and much more !
GNU General Public License v3.0
21 stars 8 forks source link

SW Crashes after 1-2 hours of conversion #12

Closed Rodemfr closed 2 years ago

Rodemfr commented 2 years ago

Several times I have seen MicronetToNMEA being stuck after 1-2 hours of conversion, apparently crashed. This must be verified and investigated.

dwarning commented 2 years ago

Can confirm, but much earlier that the communication breaks.

dwarning commented 2 years ago

I made a record of 5 sec with the stick. It looks for me that first and third message fro converter is nearly the same even though the content belong the code in .ino file should be different. 5599010ac0128103703f0109001a28288103703f1c03123459120312345614031234580e03123457120000170 [Pause: 8556 samples] 5599010ac01203123459020101830c0c0 [Pause: 12103 samples] 5599010ac01203123456020101802020040c050f0c30060805000200002a080e0a01b5d4 [Pause: 6585 samples] 5599010ac01203123458020101820c0c0 [Pause: 11371 samples] 5599010ac01203123457020101811e1e050d050b07163f090905346f210d8ebd03360 [Pause: 2894481 samples] 5599010ac0128103703f0109001a28288103703f1c03123459120312345614031234580e031234571200002e0 [Pause: 8552 samples] 5599010ac01203123459020101830c0c0 [Pause: 12105 samples] 5599010ac01203123456020101802020040c050f0c30060805000100001404070500dbeb8 [Pause: 6584 samples] 5599010ac01203123458020101820c0c0 [Pause: 11372 samples] 5599010ac01203123457020101811e1e050d050b07163f090905346f210d8ebc03358 [Pause: 2894492 samples] 5599010ac0128103703f0109001a28288103703f1c03123459120312345614031234580e031234571200002e0 [Pause: 8552 samples] 5599010ac01203123459020101830c0c0 [Pause: 12106 samples] 5599010ac01203123456020101802020040c050f0c30060805000300001604070500daea0 [Pause: 6581 samples] 5599010ac01203123458020101820c0c0 [Pause: 11372 samples] 5599010ac01203123457020101811e1e050d050b07163f090905346f210d8ebc03358 [Pause: 2894482 samples] 5599010ac0128103703f0109001a28288103703f1c03123459120312345614031234580e03123457120000170 [Pause: 8553 samples] 5599010ac01203123459020101830c0c0 [Pause: 12102 samples] 5599010ac01203123456020101802020040c050f0c30060805000600001904070500dbeb8 [Pause: 6586 samples] 5599010ac01203123458020101820c0c0 [Pause: 11373 samples] 5599010ac01203123457020101811e1e050d050b07163f090905346f210d8ebc03358 [Pause: 2894492 samples] 5599010ac0128103703f0109001a28288103703f1c03123459120312345614031234580e031234571200002e0 [Pause: 8552 samples] 5599010ac01203123459020101830c0c0 [Pause: 12106 samples] 5599010ac01203123456020101802020040c050f0c30060805000600001904070500daea0 [Pause: 6584 samples] 5599010ac01203123458020101820c0c0 [Pause: 11373 samples] 5599010ac01203123457020101811e1e050d050b07163f090905346f210d8ebc03358 [Pause: 1207846 samples] After roughly 1 minute operation stops, only display is transmitting.

Rodemfr commented 2 years ago

I suspect the crash comes when there is a collision between reception ISR (GDO0 callback) and Transmit ISR (TeensyTimer callback), but I'm not sure. I made a fix to avoid this collision but since the issue takes potentially several hours to happen on my setup, I'm not yet sure it really solves the issue. I will let the system run today. I made another change : I added usage of WFI arm instruction to use CPU's sleep mode when nothing special is to be done. This saves some power. My system (M8N + LSM303 + HC-06 + DCDC) decreases from ~980mW to ~810mW at 120MHz. The most interesting configuration is at 24Mhz where consumption drops to ~650mW for the entire system.

Rodemfr commented 2 years ago

Damn, does not solve the issue...

dwarning commented 2 years ago

One hint regarding breaks: I found that if the serial_nmea (in my case connected to an es8266 serial-wifi board) has not same baudrate (in my case 115200) as the other serial I got breaks that the display values disappears. Made with the old software state, smartrc lib and without message split. I made checks with the new software and found temporary curious traffic with the sdr stick, but no breaks on display values. Must check it further.

Rodemfr commented 2 years ago

You might have found another issue. I just found that the crash occurs around 70mins, when micros() return a value near the max value (4294924048). Looks like it is a calculation issue on time slots when time counter wraps.

Rodemfr commented 2 years ago

Problem seems to be solved now. I started a long run test to verify that it is ok. The fix has been pushed to master branch.

dwarning commented 2 years ago

Made a check with latest commit from today on a 4.1 board. Works more then 4h very well. Great work!

I have a question regarding few definitions and specs in your Micronet.txt: You say we have 16 bytes preamble + 1 byte sync, 17*104us = 1771us. This is also defined as PREAMBLE_LENGTH_IN_US. But also you define MICRONET_RF_PREAMBLE_LENGTH with 14. So we have in fact 208us shorter message length. Is it a kind of safety margin or only taken over from previous work? By the way: My display and my hull transmitter have at least 15 bytes preamble. They use a kind of power-up cycle of the PA, so we can't see clear if they use 15 or 16 bytes. I mean it is not important if 15 or 16, but the definitions above should be correct and code inline with doc.

Rodemfr commented 2 years ago

You found an error. Preamble length is 16, I corrected the definition. As you underline, having 14,15 or 16 bytes is not very important for TX or RX. It will work. However, the time constant of 1771 is crucial to calculate the size of the allocated TX window on the network and to know where you must transmit. This is where things becomes more tricky, because TX triggers of slave divices have quite some jitter (up to 400us for what I have seen) and because a nasty 244 microsecond rounding has been introduced in window size calculation which make estimation of time windows very unprecise. This is what I worked and updated this week.

By the way, I added support for "Node Info" message in slave devices. You can now check in real time the quality of the reception and the version of SW in the option->Health menu. MicronetToNMEA is seen as a NMEA converter by the display. That's useful to evaluate the best place where to attach Micronet2NMEA in the boat.

dwarning commented 2 years ago

Thank you for taking care.

RF capabilities: I would really appreciate to use higher IF for our used baudrate, deviation and bandwidth conditions. You using 150kHz, preferred value should at least 200kHz. This is also recommended from smartstudio. You can change it by FSCTRL1 in CC1101Driver.cpp line 506 to 0x08.

A static analyze with cppcheck found following weaknesses (line numbers are from a concatenated test file):

test.cpp:3544:22: warning: Member variable 'MicronetMessageFifo::store' is not initialized in the constructor. [uninitMemberVar] MicronetMessageFifo::MicronetMessageFifo() ^ test.cpp:4023:17: warning: Member variable 'NavigationData::xte_nm' is not initialized in the constructor. [uninitMemberVar] NavigationData::NavigationData() ^ test.cpp:4023:17: warning: Member variable 'NavigationData::dtw_nm' is not initialized in the constructor. [uninitMemberVar] NavigationData::NavigationData() ^ test.cpp:4023:17: warning: Member variable 'NavigationData::btw_deg' is not initialized in the constructor. [uninitMemberVar] NavigationData::NavigationData() ^ test.cpp:4023:17: warning: Member variable 'NavigationData::vmc_kt' is not initialized in the constructor. [uninitMemberVar] NavigationData::NavigationData() ^ test.cpp:4023:17: warning: Member variable 'NavigationData::hdg_deg' is not initialized in the constructor. [uninitMemberVar] NavigationData::NavigationData() ^ test.cpp:4164:14: warning: Member variable 'NmeaDecoder::sentenceBuffer' is not initialized in the constructor. [uninitMemberVar] NmeaDecoder::NmeaDecoder() ^ test.cpp:3738:22: warning: Member variable 'MicronetSlaveDevice::networkMap' is not initialized in the constructor. [uninitMemberVar] MicronetSlaveDevice::MicronetSlaveDevice() : ^

Rodemfr commented 2 years ago

I switched IF to 200kHz and pushed in the master. Concerning the warnings, I will handle them in a dedicated issue.

dwarning commented 2 years ago

I switched IF to 200kHz and pushed in the master. You are sure that you pushed?