MajicDesigns / MD_Parola

Library for modular scrolling LED matrix text displays
GNU Lesser General Public License v2.1
433 stars 135 forks source link

MD_Parola initialization fails after reboot #107

Closed TobiasPSP closed 1 year ago

TobiasPSP commented 1 year ago

I randomly get issues with any example code that uses zones: most of the time, sketches work fine after uploading (every now and then even a fresh upload fails, though, and keeps failing until I upload a different example or manually change the number of zones or modules to a smaller number).

When rebooting Arduino by powering it on, issues start to occur on a regular basis. Sometimes things work, yet most of the time a previously running sketch will now just distort the display. When that happens, the only way to recover is to upload a different sketch that uses less modules.

I already tried all suggestions given: added a strong external power supply, added direct power cables to the display, started the display blank in an effort to reduce power requirements initially. No improvement.

I then googled and found other users experiencing the very same issue. So apparently, when using more than just 1 or two modules or when starting to use zones, there seems to be some initialization issue, and initialization seems to be highly sensitive, I just can't seem to figure out what exactly is interfering.

I next wanted to try to init the library after a short delay, i.e. adding time for the power supply and SPI to stabilize. Adding a delay() to setup() doesn't change anything, though, which is not surprising since the library is implicitly initialized at startup:

MD_Parola P = MD_Parola(HARDWARE_TYPE, CS_PIN, MAX_DEVICES);

Is there a way to postpone the library initialization somehow? I am no c++ guru so my experiments to move the above call to other places in my code, starting with an empty global var and filling it later, failed miserably.

I really love the library. If the hardware won't start reliably from a power-off situation, though, my project would be at a sudden end. Do you have any other hint or suggestion for me?

Many thanks.

MajicDesigns commented 1 year ago

Please indicate what hardware and connections you have.

TobiasPSP commented 1 year ago

Many thanks for your quick response. Do you have a link to the place where you want me to post this issue?

I am using an Arduino Mega and a 2x10 8x8 matrix. I can now reliably repro the case:

  1. Sketch (code below) runs fine. I can press the Arduino reset button, and it still runs fine.
  2. Once I turn off power and power on again, the display stays blank or is distorted most of the tries. Once the LEDs do become unresponsive, there is no way for me to recover (by pressing reset or uploading the sketch again).
  3. To recover and get the LEDs responding again, in the sketch I need to change ZONE_SIZE to a different number (lower or higher makes no difference as long as it differs from the currently used value). When I then upload the temporarily adjusted sketch, the LEDs respond properly again. After I get them working again, I can then revert the ZONE_SIZE back to its original value, and my panel is working perfectly again - until the next time I power off.

Here is my code which is a trivial adaption of Parola_Double_Height_V2:

#include <MD_Parola.h>
#include <MD_MAX72xx.h>
#include <SPI.h>
#include "Font_Data.h"

// Define the number of devices we have in the chain and the hardware interface
// NOTE: These pin numbers may not work with your hardware and may need changing
#define HARDWARE_TYPE MD_MAX72XX::FC16_HW
#define NUM_ZONES 2
#define ZONE_SIZE 10
#define MAX_DEVICES (NUM_ZONES * ZONE_SIZE)

#define ZONE_UPPER  1
#define ZONE_LOWER  0

#define PAUSE_TIME 2000

#define SCROLL_SPEED 50

#define CLK_PIN   50
#define DATA_PIN  51
#define CS_PIN    49

textEffect_t scrollUpper = PA_SCROLL_LEFT;
textEffect_t scrollLower = PA_SCROLL_LEFT;

// HARDWARE SPI
MD_Parola P = MD_Parola(HARDWARE_TYPE, CS_PIN, MAX_DEVICES);
bool isReverse = false;
bool scroll = false;

// SOFTWARE SPI
//MD_Parola P = MD_Parola(HARDWARE_TYPE, DATA_PIN, CLK_PIN, CS_PIN, MAX_DEVICES);

const char *msgL[] =
{
  " ",
  "Emergency",
  "EMERGENCY",
};
char *msgH; // allocated memory in setup()

void setup(void)
{
  uint8_t max = 0;

  // work out the size of buffer required
  for (uint8_t i = 0; i<ARRAY_SIZE(msgL); i++)
    if (strlen(msgL[i]) > max) max = strlen(msgL[i]);

  msgH = (char *)malloc(sizeof(char)*(max + 2));

  // initialise the LED display
  P.begin(NUM_ZONES);

  // Set up zones for 2 halves of the display
  P.setZone(ZONE_LOWER, 0, ZONE_SIZE - 1);
  P.setZone(ZONE_UPPER, ZONE_SIZE, MAX_DEVICES-1);
  P.setFont(BigFont);
  P.setCharSpacing(P.getCharSpacing() * 2.0); // double height --> double spacing
  P.setIntensity(15);

  isReverse = true;
}

void createHString(char *pH, const char *pL)
{
  for (; *pL != '\0'; pL++)
    *pH++ = *pL | 0x80;   // offset character

  *pH = '\0'; // terminate the string
}

void loop(void)
{
  static uint8_t cycle = 0;

  P.displayAnimate();

  if (P.getZoneStatus(ZONE_LOWER) && P.getZoneStatus(ZONE_UPPER))
  {
    // set up the string
    createHString(msgH, msgL[cycle]);

    P.displayClear();

    if (scroll)
    {
      P.displayZoneText(ZONE_LOWER, msgL[cycle], PA_LEFT, SCROLL_SPEED, 0, scrollLower, scrollLower);
      P.displayZoneText(ZONE_UPPER, msgH, PA_LEFT, SCROLL_SPEED, 0, scrollUpper, scrollUpper);
    }
    else
    {
      P.displayZoneText(ZONE_LOWER, msgL[cycle], PA_CENTER, 0, PAUSE_TIME, PA_PRINT, PA_NO_EFFECT);
      P.displayZoneText(ZONE_UPPER, msgH, PA_CENTER, 0, PAUSE_TIME, PA_PRINT, PA_NO_EFFECT);
    }

// synchronize the start and run the display
    P.synchZoneStart();

    // prepare for next pass
    if (isReverse)
    {
      cycle = (cycle + 1) % ARRAY_SIZE(msgL);
      if (cycle == 0) { cycle = 1; }
      isReverse = false;
      P.setZoneEffect(ZONE_UPPER, true, PA_FLIP_LR);
      P.setZoneEffect(ZONE_LOWER, true, PA_FLIP_LR);
    }
    else
    {
      isReverse = true;
      P.setZoneEffect(ZONE_UPPER, false, PA_FLIP_LR);
      P.setZoneEffect(ZONE_LOWER, false, PA_FLIP_LR);
    }   
  }
}

In a nutshell:

So I really would love to try and initialize the hardware after a small delay, as this would verify the findings above and also provide a workaround.

Do you have an idea how I could achieve this delay in my code?

MajicDesigns commented 1 year ago

Please describe what hardware you are using (edit: I see a MEGA) and what version of the library (Parola/MD_MAX72xx).

Also, how are the power and digital signals distributed to the matrices? With 20 matrices this is critical and will cause all sorts of issues like you are seeing.

Can you also provide a reference to the "other users experiencing the very same issue" as I would be interested to see this - no issues like this have been reported in the last few years.

MajicDesigns commented 1 year ago

Please see this also: StackExchange and this: pjrc.com

TobiasPSP commented 1 year ago

Thank you, I believe I was able to identify the issue: Sketch "Parola_Double_Height_V2" (and possibly other zone-related examples) expose the issue above when the message texts are shorter than the number of LED modules.

Steps to repro:

  1. Upload "parola_double_height_V2". Works as expected.
  2. Shorten the output texts and make sure you specify less text than fits on the connected displays, i.e.:
    const char *msgL[] =
    {
    "A",
    "B",
    };
  3. Upload the sketch with the shortened texts. IT WILL CONTINUE TO RUN correctly.
  4. Remove power, wait a few seconds, then reconnect power. The sketch NO LONGER RUNS.
  5. Upload the sketch again. It no longer runs.
  6. Restore the original (longer) output texts, and upload this sketch. It runs again.

Apparently, when the display texts are shorter than fits on the number of LED units, the lib tries to access memory locations outside the array. When you initially ran the original sketch, the memory seems to be already allocated so shortening the texts will not break the sketch immediately. Only when you remove power and completely restart will Arduino allocate random memory and then access memory locations that hang the CPU. This is somewhat random (depends on the adjacent memory locations) so removing power may even work with smaller texts but more likely than not will hang the CPU.

These are just my 2ct, and I could be totally off. However it explains what I see:

Whether my theory turns out to be valid or not: I was able to fix the issue for my sketch simply by adding spaces to mý texts, effectively making them large enough to cover the entire 16x80 display I am using.

However, due to the filler spaces I am using now, the text no longer appears centered. While I could live with this, it would be nice to pinpoint the actual code that misses the checks for array size, and correct it. From a user perspective, it should be ok to specify texts shorter than the physical display. After all, that's the point of scrolling and centering text ;-)

I hope my findings can help pinpointing the issue. In the past, by looking at google results, regularly users burned their fingers once they tried to run larger displays.

TobiasPSP commented 1 year ago

I did not note the urls I found during my research unfortunately, but I did manage to find one of them in my browser history:

https://forum.arduino.cc/t/arduino-mega-doesnt-reload-program-after-power-off/1032100

I assume this user experienced the exact same issue I was chewing on. Even though I was originally googling for symptoms only (and not mentioning "Parola" in my search), the user here is using the exact same libs and is basically doing exactly what I did (using a large display, and experiencing the same power-off-on issues):

#include <MD_Parola.h>
#include <MD_MAX72xx.h>

The reason why this issue may not have been brought to your attention yet most likely is because it seems so random and hard to repro, and it seems to be related to hardware/SPI rather than sending text to the display. Most people use smaller displays and/or longer texts so for them it is a no-brainer, and when someone reposts issues, naturally people focus on the individual hardware situation of that user.

That's why these questions surface in generic forums, and most of the time the community response is to recommend adding external power supply.

TobiasPSP commented 1 year ago

After extensive testing, I identified two issues that cause the LEDs not to work after a power loss.

1 Issue in Parola / example scripts like "parola_double_height_v2":

The string handling using the array below seems to have a flaw with the pointer:

const char *msgL[] =
{
  "Double height with custom font & 2 zones",
  "ABCDEFGHIJKLMNOPQRSTUVWXYZ",
  "abcdefghijklmnopqrstuvwxyz",
  "0123456789",
  "`!@#$%^&*()_+-={};:'<>\"?,./|\\{}",
};

After re-powering (not after pressing the reset button or uploading a new sketch), the array pointer seems to point to random data. After a reset (or when Arduino was already running) the array works as expected. Something is wrong here but I don't know what it is. I just see the effects and now know how to get rid of them:

I changed the array to one static string (code below), and the scrambled display issue after powering on is history: the display works as expected and shows the correct text right away after powering on, no more distorted display until I press the Reset button.

2 Issue in MD_MAX72XX

displayZoneText() hangs the CPU when the display text array is shorter or longer than the physical display (the issue may not be present when you use scrolling text but it is a problem when you display static text and no scrolling).

It seems the library is using an index into the char array based on the physical units and not based on the maximum array size. When the array does not contain the correct number of characters, the library seems to start reading out of bounds in adjacent memory. Doesn't necessarily explain why the CPU hangs also when the text is too long.

(again, I am using a static text display, no scrolling. Scrolling text seems to be a lot more forgiving).

Workaround

I am as happy as can be now because by simplifying your example script, all is working perfectly well for me now:

This is my final sketch (connected to a 16x80 display):


#include <MD_Parola.h>
#include <MD_MAX72xx.h>
#include <SPI.h>
#include "Font_Data.h"

// Define the number of devices we have in the chain and the hardware interface
// NOTE: These pin numbers may not work with your hardware and may need changing
#define HARDWARE_TYPE MD_MAX72XX::FC16_HW
#define NUM_ZONES 2
#define ZONE_SIZE 10
#define MAX_DEVICES (NUM_ZONES * ZONE_SIZE)

#define ZONE_UPPER  1
#define ZONE_LOWER  0

#define PAUSE_TIME 3000
#define SCROLL_SPEED 50

#define CLK_PIN   50
#define DATA_PIN  51
#define CS_PIN    53

// HARDWARE SPI
MD_Parola P = MD_Parola(HARDWARE_TYPE, CS_PIN, MAX_DEVICES);
bool isReverse = true;

// length of text MUST FIT EXACTLY the length of the physical LED matrix
// this may be different when you use scroll modes but seems the case for static text display
// if you specify shorter or longer texts, strange things may happen after you reconnect power
// this text fits my 16x80 display:
const char *text = "NOTARZT";
char *textHigh; 

void setup(void)
{
  textHigh = (char *)malloc(sizeof(char)*(strlen(text) + 2));

  P.begin(NUM_ZONES);
  P.setZone(ZONE_LOWER, 0, ZONE_SIZE - 1);
  P.setZone(ZONE_UPPER, ZONE_SIZE, MAX_DEVICES-1);
  P.setFont(BigFont);
  P.setCharSpacing(P.getCharSpacing() * 2.0); 
}

void createHString(char *pH, const char *pL)
{
  for (; *pL != '\0'; pL++)
    *pH++ = *pL | 0x80;   

  *pH = '\0';
}

void loop(void)
{
  P.displayAnimate();

  if (P.getZoneStatus(ZONE_LOWER) && P.getZoneStatus(ZONE_UPPER))
  {
    P.displayClear();
    createHString(textHigh, text);
    P.displayZoneText(ZONE_LOWER, text, PA_CENTER, 0, PAUSE_TIME, PA_PRINT, PA_NO_EFFECT);
    P.displayZoneText(ZONE_UPPER, textHigh, PA_CENTER, 0, PAUSE_TIME, PA_PRINT, PA_NO_EFFECT);
    P.synchZoneStart();

    isReverse = !isReverse;
    P.setZoneEffect(ZONE_UPPER, isReverse, PA_FLIP_LR);
    P.setZoneEffect(ZONE_LOWER, isReverse, PA_FLIP_LR);
  }
}

Many thx for your awesome libs. Maybe my journey helps you polish it a bit further. It would be really nice if it was more robust in respect to text input.

MajicDesigns commented 1 year ago

Ok. I'll investigate and see if I can reproduce the issue and what a fix might be.

MajicDesigns commented 1 year ago

Can you please let me klnow what OS you are developing with (Windows, Linux, MacOs, etc) and version of the IDE (or other compile/download toolchain) you are running.

TobiasPSP commented 1 year ago

Yes: I am using Win10 plus latest Arduino IDE and latest VSCode with all typical extensions uptodate. Are you having trouble to repro?

JerrySlavec commented 1 year ago

I am having a very similar problem associated with powering off. In my case, it seems that after power off/on my program never gets control again after the begin(4) function. Once the problem is encountered, resets or uploading the sketch does not resolve the problem. It can only be resolved by changing the number of zones from 4 to 3 (or 2 or 1). Changing the number of zones somehow corrects the problem and I can immediately change it back to 4 and everything runs fine (including doing resets) until I power off again. Then the problem returns, where I never get control back after the begin(4). I am using an Arduino Mega 2560, with 8 devices, not using SPI, and running on Windows 10 with IDE 2.1.0 Any assistance/direction would be most appreciated. Thank you very much!

TobiasPSP commented 1 year ago

That sounds very familiar. Initially that’s what I did, too: change number of Zones temporarily. Only later I identified the two issues mentioned above that were reproducable and made the problem go away for me.

TobiasPSP commented 1 year ago

Problem with zone change is: it fixes the issue once it happened but it does not prevent you from encountering it again at any time.

JerrySlavec commented 1 year ago

Thank you, Tobias, I appreciate your input, getting me to re-focus on an ultimate fix rather than just a temporary workaround. I'm not using any arrays, just static text, and almost all of my text displays are long and scrolling. The way my script seems to just stop when it's executing the begin(4) got me thinking... Maybe whatever the flood of instructions the begin(4) triggers must be causing my Arduino to somehow stop further execution. No idea how that could happen, but it sure acts that way, as after turning power off then back on, nothing in my code executes after that begin(4). I keep thinking - what's the difference between a reset (which never causes the problem) and a power off/on that always causes the problem? Could be a power problem at startup? I'll keep digging!

TobiasPSP commented 1 year ago

What made me wonder is that adjusting the buffers in the way I explained back then would always fix the issue for me. So my assumption is that it is a memory/pointer issue and that somehow in certain circumstances memory gets corrupted, possibly vital parts that are needed only during power on. That said, after I fixed the buffer sizes in my project, all was fine for me in all scenarios so I successfully completed my project and moved on to other projects. That’s why I am probably not remembering all details but I believe I explained my findings in great detail when I fixed my issue. It puzzles me too though why exactly a reboot and power off behave differently. If you found the time to revisit and fix this awesome lib and make it even more robust that would be awesome. It has become the standard lib for this kind of display for many people. Thank you for your time and effort. I‘ll curiously checking back regularly to see whether a good explanation of this weird behavior is eventually found. :-)

TobiasPSP commented 1 year ago

Ah and regarding buffers, it‘s not necessarily your code but rather somehow the lib in the background that stores the text. My assumption is that when certain parameters do not fit, the lib tries to access illegal memory addresses.

MajicDesigns commented 1 year ago

I have been following this thread and have some comments.

Parola has been around for about 9 years now and the number of reported 'bugs' is virtually 0 at this stage, so I have a high degree of confidence that the code is solid, especially in the commonly used parts (like the class constructor, for example). Alomost all bugs are now power supply, wiring or other libraries used in the application. However, changes to the compiler (gcc) or the Arduino boards specifications does also affect Parola - this has happened in the past.

Parola allocates all the memory it needs for Zones in the begin() method. You can force static allocation of zones memory by setting STATIC_ZONES to 1 in MD_Parola.h. This will define MAX_ZONES zones (also defined in MD_Parola.h, default set to 4). You can try this to see if memory allocation is the problem.

"It puzzles me too though why exactly a reboot and power off behave differently" Yes, me too, as for the processor this should be the same EXCEPT that the RAM is in a different state for the two situations.

TobiasPSP commented 1 year ago

Many thx for your feedback. Maybe compilers make the difference however I always used all defaults in Arduino IDE back then. For the majority of cases the lib is rock solid however the mentioned issue comes up every now and then and is well reproducable so maybe it is a bug in a use case scenario that is just not touched by most default use cases. In any respect, for me the issue went away by crucially defining some memory buffers (do not unfortunately remember the derails but documented it all earlier in the thread). So with this (at least in my cases) whatever the root cause is could be circumvented.

Martin12350 commented 1 year ago

Hi, similar or maybe even the same issue here. After cold start displays glitches, meaning some of them do not light at all, or only some of the rows working (in most cases only 1 row) and segments acting weird. I have the latest Arduino IDE (2.1.0), latest Parola library (3.6.2) and two different microcontrollers (ESP-01 and Lolin D1 mini - both have ESP8266). I have only 4 and 8 segments, but even with that, tried to use different power supplies, boost up power wiring and also adding capacitors. Unfortunately, no luck there. I am not using any zones (for what with so small display, right?). Adding delay into setup had no effect at all.

But unlike for others even with those glitches, the rest of the display (non-corrupted segments) worked and program is normally running - WiFi will establish, MQTT connects etc. I found out, that simple reset with reset button fix those glitched segments. Obviously it is not the best solution, so I was digging further. Made command for MQTT, which would trigger ESP.reset() or ESP.restart() both actually works. But it still is not the best solution, since it takes some time to connect to WiFi, then MQTT and then some other system has to send a command to reset it.

So my next thought was, why not to reset it right after first cold boot. Unfortunately, this did not help. Until I tried to put right below the display begin function!

Display.begin();
rst_info* resetInfo = ESP.getResetInfoPtr();
if (resetInfo->reason == REASON_EXT_SYS_RST) ESP.restart();

This fix was good enough, fast and reliable, but I was trying different things and found out that calling begin twice in a row will also fix that!

Display.begin();     //one begin is not enough
Display.begin();     //this will fix glitches on cold start 

Digging further and with try and error found out that even calling MD begin function will works the same. But that looks pretty ugly and unnecessary, since we do not use MD anywhere.

MD_MAX72XX md = MD_MAX72XX(DISPLAY_TYPE, PinData, PinClock, PinCS, Segments);  
MD_Parola Display = MD_Parola(DISPLAY_TYPE, PinData, PinClock, PinCS, Segments);
void setup() {
  md.begin();
  Display.begin();
  ...
}

My suspicions are with SPI initialization. Hopefully this will help to anyone either as fix or to debug furthermore.

MajicDesigns commented 1 year ago

Thanks @Martin12350 for the insight.

TobiasPSP commented 1 year ago

Awesome, thx for sharing your findings!

Martin12350 commented 1 year ago

Thank You for your positive feedbacks. This library is really awesome and I like it. I would like to dig deeper into it and when will have some time and especially some news about this, will add here. I am curious if this "fix" is helpful in your scenarios.

MajicDesigns commented 1 year ago

I think the focus should be looking at MD_MAX72xx. There was an issue (38 maybe?) that fixed SPI definition for the mbed platform. I would start with stripping out the ifdef’ed changes related to mbed in your local versions (look for MBED in the md_max72xx .cpp and .h files) and to see if this makes a difference. You can always reload current version from GitHub.

MajicDesigns commented 1 year ago

I have looked at MD_MAX72xx and modified areas where there could have been issues with the SPI initialization introduced in v3.2.5 and released v3.4.1 with potential fixes. Can you please check if this makes a difference to your situations?

MajicDesigns commented 1 year ago

There is a new version of MD_Parola (v3.7.0) that fixes an issue with setting the intensity of the display BEFORE the zone boundaries are set on a multi-zone display (begin() in the library does this to set the default intensiy at startup). This seems the most likely issue with the problems reported here. Please test with your hardware setup.

JerrySlavec commented 1 year ago

While efforts to break apart my project (to do concise unit testing) has not revealed anything of substance, I do have good news to report. I finally implemented the following recommendation made by MajicDesigns about 3 weeks ago: _"Parola allocates all the memory it needs for Zones in the begin() method. You can force static allocation of zones memory by setting STATIC_ZONES to 1 in MD_Parola.h. This will define MAX_ZONES zones (also defined in MDParola.h, default set to 4). You can try this to see if memory allocation is the problem." Since making the STATIC_ZONES 1 change in MD_Parola.h everything now works fine, I cannot re-create the problem (where after a power off/on my code would never get control after the begin(4) was performed). I don't have any clue why forcing this memory allocation to static would resolve the problem, but it definitely did. I am very interested in what led you to this recommendation, as you were right on target! Any further information you can provide would be greatly appreciated.
I do apologize for not trying this right after your recommendation. Thank you very much for your expertise and getting me back on the road!

MajicDesigns commented 1 year ago

There was once a problem with the compiler memory allocation that was solved by the fix mentioned. In this case I think that the static allocation must be initializing the memory as well, which would mitigate the problem I discovered with setting the intensity.

Have you tried the new versions of MD_Parola and MD_MAX72xx released yesterday?

JerrySlavec commented 1 year ago

I have been extensively testing MD_Parola (v3.7.0) along with MD_MAX72xx (3.4.1) and everything is now running fine. I have not been able to re-create my earlier problem (hanging up during execution of the begin(4) after powering off/on).
Also, with v3.7.0 the STATIC_ZONES specification is back to 0 (and has stayed that way) with all my testing. So, from everything I can throw at it, your new versions of MD_Parola and MD_MAX72xx are working great. Your expertise is truly impressive - and your libraries are beyond compare. Thank you very much!

MajicDesigns commented 1 year ago

Thanks for testing.