AdamVStephen / gem-water-level-gauge

Building a water level gauge using MPX5010DP sensor and arduino.
GNU General Public License v2.0
0 stars 1 forks source link

Release 1.3.0 reboots spontaneously. #6

Open AdamVStephen opened 6 years ago

AdamVStephen commented 6 years ago

Observed several times by Nick. Note that the display of the loop counter indicates unexpected behaviour.

AdamVStephen commented 6 years ago

Code review of the main program is a follows

  1. After boot, the setup() function is invoked. This ends by initialising loops = 4695 -30; Thus the 4695 Easter egg should be triggered at the 30th loop, or 60 seconds into the boot with the default delay.
  2. Loop() structure is to increment loops; read the ADC, convert to cm, evaluate warning level then call out to the lcd_interface(v_adc, loops), hen finally parseSerial() to check on the serial interface, and delay(delay_time).
  3. delay_time is stored in the EEprom and clipped low at 1000ms.
  4. The code in lcd_interface : uses serbuf, so problems with serbuf could rebound here. The new function render_alarm takes over display of the alarm status. The msgPrefix strings are of different lengths but the sprintf() call into lcdbuf1 has the format " %s: %02d " which has a string length of 2 asterisk + 1 space + msgPrefix + colon + space + 2 digits + space + 2 asterisk = 10 + msgPrefx. Max msgPrefix size is 6 for NORMAL and this is not rendered. Otherwise max is 4 for LOLO or HIHI.
  5. The code in ParseSerial has scope for errors between serialBuf (the inbound buffer) and serbuf (the temporary string construction buffer).. More importantly, the serbuf[] buffer is arbitrarily sized to 50 characters. What happens when the sprintf(serbuf, s_disabled) happens ? s_disabled is a const char * array. Does sprintf add a trailing null? Is it valid to call with zero arguments? Why not simply avoid the sprintf call and send the fixed string directly to the Serial.println() call ? This code is definitely dodgy. However, it is only triggered by sending the "g [LlhH]" command.
  6. The counter is 16 bit signed, so will overflow at 32767 which given 4695-30 start and 0.5Hz, is 15.6 hours. I haven't tested the overflow behaviour. The uptime calculation is also screwed by the overflow - not of the counter, but it is derived from a function that gives uptime in milliseconds, and I'm putting this into a 16 bit integer, should be using 32 bits. The uptime also should have HH:MM:SS and not just HH:SS which is nonsense.

Recommendation :

  1. Fix the loop and uptime code, using unsigned long and dealing with overflow properly.
  2. check how sprintf(buf, fmt) operates when fmt is a bare string.
AdamVStephen commented 6 years ago

Note : putting the 32 bit millis() result into a 16 bit time value will overflow at 32.7s.