PlummersSoftwareLLC / NightDriverStrip

NightDriver client for ESP32
https://plummerssoftwarellc.github.io/NightDriverStrip/
GNU General Public License v3.0
1.28k stars 210 forks source link

Panic when driving larger matrixes (one long strip) on 32MB 8MB ESP32-S3-DevKitC1 #580

Open GameTec-live opened 6 months ago

GameTec-live commented 6 months ago

Bug report

32MB FLASH 8MB PRAM ESP32-S3-DevKitC1(ESP32-S3-DevKitC-1-N32R8V)

Problem

Steps

  1. Modify env:demo to compile properly for the chip (#579)
    [env:demo]
    extends         = dev_esp32-s3
    build_flags     = -DDEMO=1
                    ${dev_esp32-s3.build_flags}
                    ${psram_flags.build_flags}
    board_build.partitions = config/partitions_custom_8M.csv
    board_upload.flash_size = 32MB
    board_build.flash_mode = qio
    board_build.arduino.memory_type = opi_opi
  2. In global.h define a matrix width and height larger than 36 (so 37+ results in the panic)
  3. See core 1 panic and the system reboot

Example

Notes In this case a "Matrix" is just a bunch of daisychained LED strips going back and forth.

My globals.h globals.h.txt

Same exact config file works fine on a generic, less powerful, esp32. And reportedly running 1500 leds on one controller isnt optimal, but it works fine and i get a decent frame rate on my underpowered esp32. It shouldnt crash anyways ;)

Monitor Log: https://hastebin.skyra.pw/wikevijele.yaml

rbergen commented 6 months ago

@GameTec-live The hastebin link at the end of your description leads to an empty page with a blinking cursor in my browser.

GameTec-live commented 6 months ago

@GameTec-live The hastebin link at the end of your description leads to an empty page with a blinking cursor in my browser.

oh, weird... ill reupload later... (when I'm home)

GameTec-live commented 6 months ago

crashlog.log Sorry for the late reply, but here you go...

robertlipe commented 6 months ago

Please build debug, as instructed, and be sure you're running the trace decode filter. I think most o the flags are already present because I get them by default.

https://docs.platformio.org/en/latest/core/userguide/device/cmd_monitor.html#filters

https://github.com/platformio/platform-espressif32/issues/105#issuecomment-857808214

I suspect you're just building opt and not debug.

On Thu, Jan 4, 2024 at 1:39 PM GameTec-live @.***> wrote:

crashlog.log https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/13834734/crashlog.log Sorry for the late reply, but here you go...

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-1877660112, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD36TAZ5QBUHW5EBIVTTYM4APBAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXGY3DAMJRGI . You are receiving this because you are subscribed to this thread.Message ID: @.*** com>

GameTec-live commented 6 months ago

changed build_type under base from release to debug and set monitor_filters = esp32_exception_decoder under [env:demo]

heres the new log: debug-log.log

robertlipe commented 6 months ago

That's indicating a problem in a library we use, somewher around: https://github.com/FastLED/FastLED/blob/09c5fb8f74c43191974c09e1f31edda8281eab7e/src/platforms/esp/32/clockless_rmt_esp32.cpp#L503

You're going to have to get a debugger (or other stone-banging techniques) in there to see if mCur is walking past the end of mPixelData[] (questioning mSize corruption?) or some other zany behaviour.

I don't recognize it and don't see smoking guns on this topic in the fastled buganizer. https://github.com/FastLED/FastLED/issues?q=+ESP32RMTController%3A%3AfillNext+

On Thu, Jan 4, 2024 at 5:00 PM GameTec-live @.***> wrote:

changed build_type under base from release to debug and set monitor_filters = esp32_exception_decoder under [env:demo]

heres the new debug-log.log https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/13836222/debug-log.log log:

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-1877878275, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD3YYAWV3WFOXS2FPSF3YM4YAPAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZXHA3TQMRXGU . You are receiving this because you commented.Message ID: @.***>

GameTec-live commented 6 months ago

probably the stupidiest thing youve heard in a while, but.... i do have a pico debug probe (SWD) and cant figure out where im supposed to hook it up...

robertlipe commented 6 months ago

Dude, you can't imagine the stupid stuff I hear. That's not even on the list. :-)

There is an embarrassment of debugging options on S3. Picking one may actually be the hardest part. Espressif has this locked down. Even if you don't follow all the steps, reading this chapter is worthwhile:

https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/jtag-debugging/#

You could hook up wires for TDO, TDI, TMS, and TCK like a caveman and be proud of your pico purchase. If you work with lots of micros and have a good workflow around that, go for it. https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/jtag-debugging/configure-other-jtag.html

If you don't, don't do that. "The quickest and most convenient way to start with JTAG debugging is through a USB cable connected to the D+/D- USB pins of ESP32-S3. No need for an external JTAG adapter and extra wiring/cable to connect JTAG to ESP32-S3."

So configure it for the built-in JTAG interface. This may require some driver fiddling depending on yoru OS. The one downfall of this approach over a "real" JTAG pod, IMO, is that it resets the connection when the ESP32 resets, such as when you load new code. This has been fixed in newer ESP parts like the H2 and C6.

Now here's where I'll get hand-wavy because you surely don't want to do it the way I do it. (See also: working with dozens of micros.)

I try to use as little of platformio and visual studio as I can because I want to use the SAME debuggers on as many of those micros as I can. I understand, however, that Platformio has some pointy-clicky stuff that automates some of the above like building the openocd config. (So why did I tell you to read that? Because IMO, a developer should know these things about our tools.)

This is very Windows-centric, but it gives you the needed debug_FOO and build_type stuff and a nickel tour of GDB. https://community.platformio.org/t/how-to-use-jtag-built-in-debugger-of-the-esp32-s3-in-platformio/36042/3 This talks about the platformio doc being wrong/misleading. https://community.platformio.org/t/esp32-s3-jtag-debugging-over-usb/28182 Looks like the official doc is fixed: https://docs.platformio.org/en/latest/boards/espressif32/esp32-s3-devkitc-1.html#debugging

The concepts you have to embrace are that OpenOCD is the middle-man. It uses libusb (or Windows driver stuff) to actually open the debug interface on the board. It creates a couple of network sockets. IIRC, port 3333 is the one that GDB does the 'target extended remote:localhost:3333" to connect to (look all this up...I may be typing crazy talk, but I have the big picture right) and there's an additional port that you can telnet to that is, I think, 4444. This lets you talk to the board directly, but you can also confuse GDB if you, say, change a memory location that happens to hold a variable and you change it without GDB knowing about it. Be careful.

Once you "get it", I think you really will find that the hardest part is wading through the redundant documentation of expanations fo rmanually setting up openocd (for both the SoC and the board) which may be different for CLI use and Platformio use and the differences in the 'real' jtag interface and the chip-resident jtag interface and all that. This is why it's worth reading the whole thing - jumping around and copy-pasting bits will surely get you in trouble.

There are a lot of moving parts, but the ability to single step, display variables and stack traces, and set breakpoints is worth its weight in gold and worth the finicky setup.

Good night, good luck and... May the source be with you.

RJL

On Fri, Jan 5, 2024 at 2:20 AM GameTec-live @.***> wrote:

probably the stupidiest thing youve heard in a while, but.... i do have a pico debug probe (SWD) and cant figure out where im supposed to hook it up...

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-1878291397, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD35GVN36RNONZI6BR4LYM6ZWRAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQNZYGI4TCMZZG4 . You are receiving this because you commented.Message ID: @.***>

GameTec-live commented 6 months ago

Okay, thanks for the help, i managed to get a debugger running (it was a driver issue...)... I don't know what the value of mCur is supposed to be, etc... I'll poke around a bit more and report back...

GameTec-live commented 6 months ago

cant really see anything unusal? (i mean; i also dont know what those values are supposed to be lol) mPixelData seems to be large or even "infinite" as i can request a lot more than 1500 array entries with the debugger... mCur also goes past 1500...

rbergen commented 6 months ago

@GameTec-live Glad to see you got the debugger working, and I appreciate the earlier upload of the debug log. As @robertlipe already indicated, it does show that the actual problem (which is a form of illegal memory access) takes place several levels below "our" code. In fact, the backtrace doesn't even include references to any code that's part of NightDriverStrip.

Concerning your last comment, we're obviously not looking over your shoulder, so we can't see what you are looking at. Also, even if we could then figuring out what the cause of the invalid behavior is would effectively require us to debug the dependency libraries involved. Which isn't entirely impossible, but very difficult if we can't ourselves debug trace through the code just before the problem occurs.

Without having the hardware and software setup that triggers these crashes available, I therefore think we won't be able to solve this. You could (still) consider raising a bug report in the dependent libraries (Espressif ESP-IDF and/or FastLED) and see if they are able to provide pointers to what's actually behind this.

I'll leave this issue open in case someone else runs into the same problem, and may be able to provide additional information that can help get to the root of this.

GameTec-live commented 6 months ago

Yeah, kinda hard to replicate a issue without hardware... Ill open a issue over at FastLED... Thanks for the help though...

prschguy1 commented 5 months ago

hey GameTec-live While i am not one of the bigger brains on this repository, perhaps I can provide some help to the problems you are experiencing. here. My understanding is that demo is intended for strip effects only. I was a bit surprised that you tried putting spectrum into a demo build. have never tried that, but assumed it wouldn't work as newer spectrum builds use dma. .Instead of using demo as your build, might suggest using spectrum-elecrow, I have that working properly on the chip you specify. it looks good with pdm mic, remote, display, and all the effects. when I run it with strip effects, I run out of memory, but runs well on spectrum. might give it a try. Memory usage looks pretty good here. just my 2 cents.

Capture

https://github.com/PlummersSoftwareLLC/NightDriverStrip/assets/69419979/5ded102a-77ff-4674-a675-4e96acdfb413

GameTec-live commented 5 months ago

hey GameTec-live While i am not one of the bigger brains on this repository, perhaps I can provide some help to the problems you are experiencing. here. My understanding is that demo is intended for strip effects only. I was a bit surprised that you tried putting spectrum into a demo build. have never tried that, but assumed it wouldn't work as newer spectrum builds use dma. .Instead of using demo as your build, might suggest using spectrum-elecrow, I have that working properly on the chip you specify. it looks good with pdm mic, remote, display, and all the effects. when I run it with strip effects, I run out of memory, but runs well on spectrum. might give it a try. Memory usage looks pretty good here. just my 2 cents.

Capture

https://github.com/PlummersSoftwareLLC/NightDriverStrip/assets/69419979/5ded102a-77ff-4674-a675-4e96acdfb413

thx for the info, ig ill try that... My matrix is just one long strip though...

GameTec-live commented 5 months ago

altough using the spectrum-elecrow project seems to work, it doesnt help me much as spectrum drives hub whatever its called matrixes and i just have one long strip snaking back and forth...

prschguy1 commented 5 months ago

All of the spectrum builds use a series of strips that zig-zag back and forth as you describe. They can be identified as there is only 1 pin used for output led_pino. The hub75 is currently only used for the Mesmerizer builds that require around 14 output pins depending on what you are doing. Dave does a great job of describing all of this here:

https://www.youtube.com/watch?v=COJnlehBcKw&t=224s

The video I posted is using a standard zig-zag strip as you describe at 16 pixels high by 48 pixels wide on spectrum build, using the chip you specified.

GameTec-live commented 5 months ago

ah, ok... so ig I will fumble a bit more with spectrum elecrow and try to use that... thanks...

GameTec-live commented 5 months ago

nvm, setting the matrix to the required size crashes elecrow too...

prschguy1 commented 5 months ago

Perhaps if you could articulate exactly what you are trying to accomplish, we might be able to help. As far as I am seeing, this chip is working without any crashes for the spectrum build. While there aren't a lot of spectrum effects here, intend to test it more thoroughly.

GameTec-live commented 5 months ago

Perhaps if you could articulate exactly what you are trying to accomplish, we might be able to help. As far as I am seeing, this chip is working without any crashes for the spectrum build. While there aren't a lot of spectrum effects here, intend to test it more thoroughly.

drive my 50x30 matrix (being one long strip) with the new, more powerful ESP32-S3... It works fine with my current, not as powerful ESP32 (afaik its even singlecore?), had to disable nice to haves like the webserver though for it to run a stream from the computer at a decent framerate which im hoping to fix with this a lot more powerful variant...

GameTec-live commented 4 months ago

Stupid question, but can some of the devs or someone more competent than me try and compile demo with a 50x30 matrix? I tried to compile elecrow, 50x30, similar error. Tried to compile demo for a seeed studio esp32 c3 (ik, not officially supported) and it's still the same error (from what it looks like) Havnt tried a nodemcu (clone) yet as thats currently driving the matrix and id rather not break it until my replacement mcu works... :/

robertlipe commented 4 months ago

C3 is DOA for us. Not because it's RISC-V but because it's single core. S2 is LX7 like S3, but it's single core, so that's similarly dead to us.

I think there are configurations that probably could be made to work with some engineering investment (like cleaning up the multiple threads that are spin looping) but right now, unless you're willing to drive that effort, we require the two core models. So no c2, c3,. C6, , h6, or such. P4 is a contender, but they're not sampling yet and honestly, even when they do, I expect work in esp-idf and probably all that Arduino code that was dragged off an 8-bit CPU is going to burst into flames when faced with a 64-bit system. My anxiousness to tackle that may be proportional to the dev board price when they're announced.

You want to stay in the LX6 dual core parts like esp32 nothing (Those with no suffix) but that appears in a ton of packages and modules or the dual core LX6 like ESP32-S3 in about any id it's forms, but I'd lean to the n8r16 though the n2r2 and other combinations of flash and ram are fine.

Actually,. You just inspired my local copy to make dual cores a hard requirement at runtime and probably at compile time.

On Sun, Feb 11, 2024, 11:52 PM GameTec-live @.***> wrote:

Stupid question, but can some of the devs or someone more competent than me try and compile demo with a 50x30 matrix? I tried to compile elecrow, 50x30, similar error. Tried to compile demo for a seeed studio esp32 c3 (ik, not officially supported) and it's still the same error (from what it looks like) Havnt tried a nodemcu (clone) yet as thats currently driving the matrix and id rather not break it until my replacement mcu works... :/

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-1938098054, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD35CLTAYKBL3S6RFOILYTGU3NAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMZYGA4TQMBVGQ . You are receiving this because you were mentioned.Message ID: @.***>

GameTec-live commented 4 months ago

Ok, that makes sense then, was just the one I had laying around... Id still be interested if someone else can replicate this issue or if its just me or maybe even a defect MCU... And having a reproducable thing / minimum reproducable example might help speed things up here (or over at fastLED)

prschguy1 commented 4 months ago

Hello [GameTec-live], have looked at your situation a good bit over the last couple of weeks, and do get similar results. While I can approach your matrix size, cannot quite get there. Have a similar problem when trying to use this chip with 4 channel strip effects. Have tried different board define files as well as different memory tables, but still have not solved this problem. While an alternate led program makes both of our build problems work, I would like to get it working here. What I have learned is programs can make use of 16 mb, and the 32mb that these chips support is only useful for storage. Have tried both of our builds with unexpected maker s3 pro, wemos s3, and generic and official builds of esp32-s3 in various memories. Have a wemos d32 pro, and m5stack I'll try our build on next. M5 stack used to work on my build, but suspect that no longer works. Will let you know what I find.

GameTec-live commented 4 months ago

Ah, so it isn't just me XD Thanks for trying to help though.

GameTec-live commented 1 month ago

@robertlipe while ordering other stuff, i threw in a N16R8, so the 16MB version you apperently use... I still get the same panic...

robertlipe commented 1 month ago

Seems we're getting no traction with FastLED looking into this.

I'm about to be on the road through the end of the month, so I can't commit to looking into this in the short term, though I know this has been simmering a long time. Looks like I may have to debug FastLED. 😐

Is there a repro case for this here? Does it take a while to bonk or is it close to immediate?

On Wed, May 15, 2024, 2:20 PM GameTec-live @.***> wrote:

@robertlipe https://github.com/robertlipe while ordering other stuff, i threw in a N16R8, so the 16MB version you apperently use... I still get the same panic...

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2113298066, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD33WTSEGOIIFGP4PVHLZCOYRFAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGI4TQMBWGY . You are receiving this because you were mentioned.Message ID: @.***>

davepl commented 1 month ago

The S3 LEDSTRIP project with the onboard LED will fault out reliably. Only problem is the S3 reboots without showing a call stack or exception chain!

On May 15, 2024, at 12:43 PM, Robert Lipe @.***> wrote:

Seems we're getting no traction with FastLED looking into this.

I'm about to be on the road through the end of the month, so I can't commit to looking into this in the short term, though I know this has been simmering a long time. Looks like I may have to debug FastLED. 😐

Is there a repro case for this here? Does it take a while to bonk or is it close to immediate?

On Wed, May 15, 2024, 2:20 PM GameTec-live @.***> wrote:

@robertlipe https://github.com/robertlipe while ordering other stuff, i threw in a N16R8, so the 16MB version you apperently use... I still get the same panic...

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2113298066, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD33WTSEGOIIFGP4PVHLZCOYRFAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGI4TQMBWGY . You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2113331384, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF4KWJ3YWMMBXNBPUQLZCO3FRAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGMZTCMZYGQ. You are receiving this because you are subscribed to this thread.

rbergen commented 1 month ago

Should we maybe include the fix for this in #626?

I believe it is as simple as not touching the on-board LED. We could achieve that by just commenting out the respective #ifdef ESP32FEATHERTFT block in globals.h.

davepl commented 1 month ago

I would… there might be an ONBOARD_LED define or something that we can turn off for that build, as it’s not on in most.

On May 15, 2024, at 2:35 PM, Rutger van Bergen @.***> wrote:

Should we maybe include the fix for this in #626 https://github.com/PlummersSoftwareLLC/NightDriverStrip/pull/626?

I believe it is as simple as not touching the on-board LED. We could achieve that by just commenting out the respective #ifdef ESP32FEATHERTFT block in globals.h.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2113493502, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF654S6VEZA633AO52DZCPIJ3AVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGQ4TGNJQGI. You are receiving this because you commented.

rbergen commented 1 month ago

I think the on-board LED code is only activated if ONBOARD_PIXEL_POWER is defined.

robertlipe commented 1 month ago

That may be why I've never seen this. I don't use that build and the onboard 2812 present in some boards is connected to various pin numbers depending on the board. That number is exposed in the Espressif (platdoemio? Arduino?) board definitions, but people tend to play pretty loose with unning a definition that "works" but isn't exactly right. We should be using that definition and not our own in config.h if so.

I can imagine that if you're running a board definitions with an onboard led in what what a high GPIO number on esp32-nothing but that pin happens to be the quad or octal PSRAM pins on S3 we might get that kind of thing. Yanking a pin used as one bit of RAM and treating it like an LED can't be good. From memory, I think that 35-38 and 45 are off limits on boards with ocral psram.

The stackless crash is probably what you'd recognize, Dave, as a double panic or double page fault - it's an exception within the exception handler. That would be consistent with the above guess, me never seeing this, and it being otherwise unreproducible in raw FastLED.

If this crash is predicated upon the onboard led, this would all fit.

On Wed, May 15, 2024, 4:40 PM Rutger van Bergen @.***> wrote:

I think the on-board LED code is only activated if ONBOARD_PIXEL_POWER is defined.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2113500233, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD353UG47O4DGGT7A4ODZCPI5TAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJTGUYDAMRTGM . You are receiving this because you were mentioned.Message ID: @.***>

GameTec-live commented 1 month ago

So, it seems like #626 is supposed to contain a fix for this? I pulled down that branch/fork, compiled it, nothing, still panics. For clarity ive now generated git diffs of the exact changes i made and once again uploaded the logs. (For both of my controllers) (the N16R8 log is a bit wierd though, as it doesnt spit out the core dump? It did before, with the same illegal cache access, but when capturing the log it just didnt...) N16R8-crashlog.txt N32R8V-diff.patch N32R8V-crashlog.txt N16R8-diff.patch

Edit: ofcourse the secrets.h isnt included, but its just a filled out template with hostname and WIFI credentials, etc...

rbergen commented 1 month ago

No, this is a different problem than the one the latest change in #626 is trying to address. That fixes random crashes without any panic logging on the S3 with the Feather project "as standard", after the device has been running for a while.

Your controllers seem to consistently crash as soon as WiFi tries to connect, with PSRAM enabled. I think I remember Dave configuring PSRAM to be off on all devices except Mesmerizer because he was seeing exactly this behavior.

GameTec-live commented 1 month ago

Ah, ok... Well, i tried leaving the psram off (not adding the build flag) too, but it didnt work either, i can try again later and send some logs for that too...

rbergen commented 1 month ago

Then it seems we have 3 different problems - which I wouldn't find surprising at all.

(I know that observation adds absolutely nothing towards a solution, but it's all I can conclude at this point in time...)

GameTec-live commented 1 month ago

Well, observation is usually the first step to figuring stuff out? XD

Anyways, enjoy the other 2 log files and diffs of the 2 PSRAM less builds:

N32R8V-noPSRAM-diff.patch N32R8V-noPSRAM-crashlog.txt N16R8-noPSRAM-diff.patch N16R8-noPSRAM-crashlog.txt

Edit: Whoops, that one log was connected to the wrong port, no wonder that its so short, heres the longer one: N16R8-noPSRAM-crashlog.txt

robertlipe commented 1 month ago

Agreed. It looks like it IS a different problem.

So we have Dave's problem of treating PSRAM pins as LEDs. We have Gametec's issue of the crash inside FastLED when "lots of LEDs" are used. (Is network traffic a key?)

What's the third? Sorry. My on-board CPU is running slow today.

On Sun, May 19, 2024 at 6:59 AM GameTec-live @.***> wrote:

Well, observation is usually the first step to figuring stuff out? XD

Anyways, enjoy the other 2 log files and diffs of the 2 PSRAM less builds:

N32R8V-noPSRAM-diff.patch https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369242/N32R8V-noPSRAM-diff.patch N32R8V-noPSRAM-crashlog.txt https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369243/N32R8V-noPSRAM-crashlog.txt N16R8-noPSRAM-diff.patch https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369244/N16R8-noPSRAM-diff.patch N16R8-noPSRAM-crashlog.txt https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369245/N16R8-noPSRAM-crashlog.txt

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2119209931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD32XK3AQGLJL6PSOZYDZDCHYRAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGIYDSOJTGE . You are receiving this because you were mentioned.Message ID: @.***>

davepl commented 1 month ago

Dave's problem of treating PSRAM pins as LEDs.

Can you explain?

On May 19, 2024, at 10:51 AM, Robert Lipe @.***> wrote:

Agreed. It looks like it IS a different problem.

So we have Dave's problem of treating PSRAM pins as LEDs. We have Gametec's issue of the crash inside FastLED when "lots of LEDs" are used. (Is network traffic a key?)

What's the third? Sorry. My on-board CPU is running slow today.

On Sun, May 19, 2024 at 6:59 AM GameTec-live @.***> wrote:

Well, observation is usually the first step to figuring stuff out? XD

Anyways, enjoy the other 2 log files and diffs of the 2 PSRAM less builds:

N32R8V-noPSRAM-diff.patch https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369242/N32R8V-noPSRAM-diff.patch N32R8V-noPSRAM-crashlog.txt https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369243/N32R8V-noPSRAM-crashlog.txt N16R8-noPSRAM-diff.patch https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369244/N16R8-noPSRAM-diff.patch N16R8-noPSRAM-crashlog.txt https://github.com/PlummersSoftwareLLC/NightDriverStrip/files/15369245/N16R8-noPSRAM-crashlog.txt

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2119209931, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD32XK3AQGLJL6PSOZYDZDCHYRAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGIYDSOJTGE . You are receiving this because you were mentioned.Message ID: @.***>

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2119312691, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF533LISHGI63PLTC5DZDDRDDAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJZGMYTENRZGE. You are receiving this because you commented.

rbergen commented 1 month ago

Agreed. It looks like it IS a different problem. So we have Dave's problem of treating PSRAM pins as LEDs. We have Gametec's issue of the crash inside FastLED when "lots of LEDs" are used. (Is network traffic a key?) What's the third? Sorry. My on-board CPU is running slow today.

Problem number 3, as I remember it(!), is S3 boards crashing at WiFi connect when PSRAM is enabled. Which I may have misremembered, in which case there is no problem number 3.

GameTec-live commented 1 month ago

@rbergen you seem to be misremembering, as atleast for me on my 2 MCUs, with PSRAM enabled, driving a smaller matrix (eg 10x10) boots up perfectly, etc... For completeness have the logs and git diffs anyways: N16R8-smallWorking-diff.patch N32R8V-smallWorking-diff.patch N32R8V-smallWorking-log.txt N16R8-smallWorking-log.txt

rbergen commented 1 month ago

No, I'm not - or at least not in that way. The point is that there are now 3 problems with (certain) S3 boards we know of. Your problem - that being the one this issue concerns primarily - is the second problem in @robertlipe's summary. There are two others, though. One is the first Robert mentions, the other the one I mentioned in my previous comment.

robertlipe commented 1 month ago

Either we have a missing qualifier on #3 or it's behind us. I run S3's almost exclusively, in a variety of configurations (not all), except for Mesmerizer, and just don't see that. You just fixed #1 on my list. That means we have only Gametec's still live and in play, right?

If we track problems we used to have, I'm sure we'll all lose our minds even more quickly. Since we don't really do releases with version numbers and git syntax of "git checkout git rev-list -n 1 --first-parent --before="2009-07-27 13:37" master" is no fun to try to walk backward a quarter or a month at a time and try to find a repro case and a time when it was live, let's just take that out of play.

We have a moderate number of people using S3's in strip configurations. Gametec's is special because it only shows up with 'lots' of LEDs involved

It was the posts from May 15 that muddied the water (at least my own mental mud) that #1 (Dave's observed problem) and #2 (Gametec's) were related.

Problem #1 was just using a configuration for a different board that, at best, was an unused pin on other boards that happens to be used for the memory controller on S3. Maybe we could subclass/wrap the cases calling gpio_pin_foo* and trap the cases of using the pins reserved for the PSRAM/PSROM controller IF either filesystem or PSRAM_ENALBLED are set.

I'm soon on the road again, this time through the end of the month, but will try to pick this one up sometime during my return. Prschguy has replicated the results. Apparently a build of Spectrum effects on a board with an S3 configuration with just the size run up to 50x30 (sidebar: knowing that the frame rate will be terrible. That's too many bulbs on one 800khz bus...) is enough to crash it and it crashes almost immediately. Right, guys?

On Mon, May 20, 2024 at 4:24 AM Rutger van Bergen @.***> wrote:

No, I'm not - or at least not in that way. The point is that there are now 3 problems with (certain) S3 boards we know of. Your problem - that being the one this issue concerns primarily - is the second problem in @robertlipe https://github.com/robertlipe's summary. There are two others, though. One is the first Robert mentions, the other the one I mentioned in my previous comment.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2120044428, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD35IMJQPFNYXM7NSOBTZDG6NZAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQGA2DINBSHA . You are receiving this because you were mentioned.Message ID: @.***>

rbergen commented 1 month ago

That means we have only Gametec's still live and in play, right?

Based on what you say in the rest of your comment, that's quite likely true.

If we track problems we used to have, I'm sure we'll all lose our minds even more quickly.

That may be true for some, many or most, but not me. I need to keep some record of problems we used to have to retain my sanity. If only because some "solved" problems have the unpleasant habit of rearing their heads again sometime later.

But, I can be quiet about it if that's generally preferred. :)

davepl commented 1 month ago

We should certainly have a bug/issue to track this. If it never manifests again, we can close or ignore it, but shouldn’t forget about it!

On May 20, 2024, at 6:31 AM, Rutger van Bergen @.***> wrote:

That means we have only Gametec's still live and in play, right?

Based on what you say in the rest of your comment, that's quite likely true.

If we track problems we used to have, I'm sure we'll all lose our minds even more quickly.

That may be true for some, many or most, but not me. I need to keep some record of problems we used to have to retain my sanity. If only because some "solved" problems have the unpleasant habit of rearing their heads again sometime later.

But, I can be quiet about it if that's generally preferred. :)

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2120468622, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF3C36MZPO2NAWADHUTZDH3JNAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQGQ3DQNRSGI. You are receiving this because you commented.

robertlipe commented 1 month ago

There has been one for months. It's https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580 ...and it's the very one you're commenting on. (Like me, you're probably reading this all through the email gateway and it can be difficult to see the origins of long-running discussions sometimes.)

There's a sister issue open in FastLED because I'm willing to be moderate money that this is actually an issue inside FastLED, but FastLED has a tough struggle in their issue queue and a preference for working on 8-bit micros. :-/

Also, Dave, I'm not saying that we shouldn't have some institutional memory of such things. (OTOH, we three main devs fix things all the time without opening issues on the bugs we've fixed...) I'm just trying to focus this specific list of things to things that actually need developer attention instead of a bucket of problems that's ever been observed on an S3.

It was the mention of what we're now calling "#1/Dave's problem" on this open issue (#580 - the one about "lots of leds" (LOL?)) that made me first think it was thought to be the same issue and that took the whole discussion a little off track. No harm now that we have it all refocused.

On Mon, May 20, 2024 at 8:36 AM David W Plummer @.***> wrote:

We should certainly have a bug/issue to track this. If it never manifests again, we can close or ignore it, but shouldn’t forget about it!

  • Dave

On May 20, 2024, at 6:31 AM, Rutger van Bergen @.***> wrote:

That means we have only Gametec's still live and in play, right?

Based on what you say in the rest of your comment, that's quite likely true.

If we track problems we used to have, I'm sure we'll all lose our minds even more quickly.

That may be true for some, many or most, but not me. I need to keep some record of problems we used to have to retain my sanity. If only because some "solved" problems have the unpleasant habit of rearing their heads again sometime later.

But, I can be quiet about it if that's generally preferred. :)

— Reply to this email directly, view it on GitHub < https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2120468622>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA4HCF3C36MZPO2NAWADHUTZDH3JNAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQGQ3DQNRSGI>.

You are receiving this because you commented.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2120477503, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD354SIAZKKCCG4P73S3ZDH34DAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMRQGQ3TONJQGM . You are receiving this because you were mentioned.Message ID: @.***>

GameTec-live commented 4 days ago

Dont want to be annoying (greatly apprecite the work your doing here), but has there been any progress on this issue?

davepl commented 4 days ago

Sorry, I’m not up to speed on this one, can someone refresh my memory?

On Jun 30, 2024, at 11:41 AM, GameTec-live @.***> wrote:

Dont want to be annoying (greatly apprecite the work your doing here), but has there been any progress on this issue?

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2198647028, or unsubscribe https://github.com/notifications/unsubscribe-auth/AA4HCF4HHST66JBSWGLE4FDZKBGNJAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGY2DOMBSHA. You are receiving this because you commented.

robertlipe commented 4 days ago

Earlier today, I meant to ping this thread. I can't find it right now, but somewhere in recent reading of release notes (esp-idf? espressif32-arduino?) I saw a commit that looked like it might have been related to this.

I don't think it was https://github.com/espressif/arduino-esp32/pull/9906 but maybe it was. (We're not neopixel, but the code path isn't totally DISsimilar.)

I remember thinking that if something like an SPI read to the filesystem happened while the RMT DMA was in progress (or vice versa) that the code would trample itself when under heavy load and crash. It was more likely to occur under heavy RMT/SPI use (check!) and it was more likely to happen on the S3 (check!)

I don't remember if the fix was in arduino-espressif (and whether it was 2.x, which we use, or 3.x, which we currently can't) or in esp-idf. The ecosystem is a bit messy right now as they work through updating the world to the slightly incompatible arduino-espressif 3.x.

I recognize this is a bit incoherent (sorry) but sometime in the last 12 hours, I remember seeing a fix in upstream/related code that looked like it MIGHT be related to this.

I'm about to crash, but there may be some recent progress in SOME upstream something that might have helped.

Unfortunately, though, FastLED seems to have fundamental issues under load. See, e.g., https://github.com/FastLED/FastLED/issues/1438

On Sun, Jun 30, 2024 at 1:53 PM David W Plummer @.***> wrote:

Sorry, I’m not up to speed on this one, can someone refresh my memory?

On Jun 30, 2024, at 11:41 AM, GameTec-live @.***> wrote:

Dont want to be annoying (greatly apprecite the work your doing here), but has there been any progress on this issue?

— Reply to this email directly, view it on GitHub < https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2198647028>, or unsubscribe < https://github.com/notifications/unsubscribe-auth/AA4HCF4HHST66JBSWGLE4FDZKBGNJAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGY2DOMBSHA>.

You are receiving this because you commented.

— Reply to this email directly, view it on GitHub https://github.com/PlummersSoftwareLLC/NightDriverStrip/issues/580#issuecomment-2198649901, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD32WQ3IMSPJ4JMI2OALZKBH3LAVCNFSM6AAAAABBFMSILWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCOJYGY2DSOJQGE . You are receiving this because you were mentioned.Message ID: @.***>

GameTec-live commented 2 days ago

So progress but also not? fun

Thanks for the update though