SeanMcKeen / IT-Lablights

1 stars 0 forks source link

Drive by comments #2

Open robertlipe opened 5 months ago

robertlipe commented 5 months ago

Hey, Sean!

Looks promising! For someone that claims to be just starting, you're getting a lot of things right, but I hope you're open to some help awesomeizing things here and there. It's late and I'm just freehanding, so anything hta tlooks like code but doesn't actually compile is just me being a dumb. It should be inspirational enough that one of us can go to cppreference.com or github search and make it make sense.

https://github.com/SeanMcKeen/IT-Lablights/blob/c183ae6297586ab482317f79a4c068e3aa83a3bd/src/snmp.cpp#L97

C style {array[], int array_size} pairs are a drag. Carrying around to things that describe a pile of stuff sucks. Put them in a container. There are a bunch, but I'll just recommend two: std::vector array; You can append things easily. You can insert in the middle, but that's slow. array.size() always contains the size. YOu called array.push_back(new_packet) ? array.size() just grew. Now you can walk the array C style, but now you have icky loop iterators. for (auto& port : hub) { port.SetLed(on); }

So std::vectors are cool, but you already have C-style arrays. They recently (C++20?) added std::array that makes plain old staticly sized arrays act like normal C++ containers. Once you see examples, I think the proverbial LED will go off for you: https://www.geeksforgeeks.org/stdarray-in-cpp/ https://www.geeksforgeeks.org/vector-in-cpp-stl (I don't like all their examples. I prefer "for (auto& thing : things){thing.DoStuff();}" to "for (int i = 0; i < things.size(); i++){things[i].DoStuff();}"

Rule of thumb: compile-time fixed size? (number of LEDs) std::array. If you need a container that can grow? (receive packet buffer) std::vector. You can earn hashes later.

snmp's createArrays should really fall out nicely with std::array

The (in|out){1,2,3,4}Total stuff should probably be: const int kThingyCount = 4; struct ThingyData { unsigned long onTotal; unsigned long outTotal; mumble lastInOctets; something responseInOctets; }; std::array Thingies[kThingyCount];

Now void handleAllOutputs(std::array& ThingiesInput) { for (int i = 0 ; kThingyCount; i++) { Thingies[i].lastInOctets += ThingiesInput[i]; (or something - I don't really grasp th relationships here. But you clearly have four of something that have a relationship to some number of something elses. The data structures should represent that. I think that setTotals() { for (int i = 0 ; kThingyCount; i++) { thing[i].totalsIn += Totals[i].in; } } }

// My examples may be making this worse because I'm not understanding the (lack of) data structures now used. I think you have all the right pieces identified, you just need to put them in structs and arrays so you can operate on multiples.

Math!

This could be very cool, but it's also low-value. You can design key->value pairs and place them in a map. Then use std::lower_bounds to find the lowest KV pair that matches the search value. Return the found key's value and you're done. The whole file reduces to three tables and a few lines of code. It's something approximately like:

map<int key, int value>table; const table = { {1000,1} {10000,2} {35000,3} {100000,5} };

// Now these are in a container you could iterate for debugging: for (auto x : table) { cout << x.first << "->" << x.second <<endl; }

// If these tables were large, we could call std::lower_bound, but it's probably clearer to just code it:

int rv; int find_key(value) { for (auto x : table) { if x.value >= value return rv; rv = x.value; // save the previous one } return 17; // It's the kind of thing to do with unit tests, for sure.

main.cpp make constants const; const int kPollInterval = POLL_DELAY; Pull all these unsigned long outPulseInterval3 = 0; into a struct if they're related, then, if there are always four of them, put them in a std::array of those[4]; (or kNumberThingies or whatever)

include use "" if it's local project and <> if system

lablights // figure out Strip[1-4] That comets[] loop gets easier/clearer for (auto& comet : comets) { if comet >= 0 && comet <= TotalLeds leds[comet] += forwardColors... oof. Maybe we need the index.

if (constexpr (PROJECT_NAME == "Lablights")) { That lets the code get omitted at compile-time if it's not true

Think about nerd math to make the reverse and forward cases the same.

fadeAll() - clever, but I think that fadeLightBy https://fastled.io/docs/group___color_fades.html does this.

forward,reverseEvent() - fid a way to eat that number of strips into 1 body. 8 bodies is too many.

I know it seems odd, but I'm excited to see this. I knew when we talked months ago that you had a strong vision of how you wanted this to work and the determination to brute force it into reality. You have the appetite to make this awesome and I have a bit of mentoring time on my hinds. (I can also be grating as hell, sticking my nose in where it wasn't asked for. :-) )

I'm sure you'll take some of this and run with it. Some if it just doesn't matter (like maybe reducing that math stuff to tables). Some of it won't make any sense at all because I'm sleep-deprived and talking out of turn.

I don't know anything about SNMP, but if you have some kind of ability to dummy up/mock things for testing and I can test individual strips/efffects, I'm willing to help with some of the bit-slinging just because I think this is a cool project.

TTYS!

SeanMcKeen commented 5 months ago

Hi, I appreciate your enthusiasm and mentoring (god knows I need it). Here's the thing, I understand none of this whole std::array and std::vector thing. Every time I saw that type of thing in code examples my brain just exploded. However, I figured out vectors (BARELY), just enough to replace one thing with them. And i do really love being able to just use arr.Size(); as well as now just looping with for (const int o : Array) {}

Here's my big issue, I can't figure out why, but the code is printing out port 1, 2, 3, and 4. and then where its supposed to be 5, 6, 7, and 8. It just keeps printing out 3, 4, 3, 4. NO CLUE why, I even tried removing variables and just putting snmpLoop({5, 6}, 3); and even that doesn't work.

I honestly have no clue how c++ works with some of these more complicated things, and thats why I stuck with the simplest of the simple even if it was CLUNKY as hell.

robertlipe commented 5 months ago

Hi, Sean. If you recall, we "met" on NightdriverLED.

Reading that again, it sounded. Like I came in with guns blazing. That really wasn't my intention. Sorry. I got a little enthusiastic... And rambly, as I do late at night.

I don't do SNMP and really don't have a lot of interest in the data collection side. Looks like you had that part locked down. Awesome. But I have some FastLED experience and more than a little in data structures in C++ and C.

This is esp32,. Right?

Is there some way I can build this and run it standalone, perhaps with a visualizer instead of wiring up a bunch of custom strips, so that I could help mentor a little more over each other's shoulder? Is there maybe some test mode for continuous integration/delivery?

I'd like to help,. But I'll admit that some of your code. Is so thick that I'm not totally comfortable reading it, blindly trying something, and saying "here's a PR that makes your code 80% simpler" without risking it being 80% wrong.

If course,. If you're happy enough with your direction, I'm fine being told to sit down and shut the hell up. 🤓

But I'd I can build and test this and recognize a. Pattern of blinkies, I'll join your internship programs I think you had interesting idea with this.

On Tue, Feb 13, 2024, 10:19 AM SeanMcKeen @.***> wrote:

Hi, I appreciate your enthusiasm and mentoring (god knows I need it). Here's the thing, I understand none of this whole std::array and std::vector thing. Every time I saw that type of thing in code examples my brain just exploded. However, I figured out vectors (BARELY), just enough to replace one thing with them. And i do really love being able to just use arr.Size(); as well as now just looping with for (const int o : Array) {}

Here's my big issue, I can't figure out why, but the code is printing out port 1, 2, 3, and 4. and then where its supposed to be 5, 6, 7, and 8. It just keeps printing out 3, 4, 3, 4. NO CLUE why, I even tried removing variables and just putting snmpLoop({5, 6}, 3); and even that doesn't work.

I honestly have no clue how c++ works with some of these more complicated things, and thats why I stuck with the simplest of the simple even if it was CLUNKY as hell.

— Reply to this email directly, view it on GitHub https://github.com/SeanMcKeen/IT-Lablights/issues/2#issuecomment-1941934844, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD35HA47ISFC3BNQ5N3TYTOHC5AVCNFSM6AAAAABDCUB6ROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNBRHEZTIOBUGQ . You are receiving this because you authored the thread.Message ID: @.***>

SeanMcKeen commented 4 months ago

After weeks of brain damage from attempting to rework and fix the code the best I could. I still can't seem to find the error. Even without SNMP even being used, the lights crash and don't reset the esp. No clue what this could be, I was assuming memory write errors but now I have no clue.

I uploaded it to the testing branch once again hoping someone else can see what I'm doing wrong. I'm at the point where I can't seem to find a solution no matter how much research and head banging I do.

About a virtual way to test this... I have no idea how to do any of that.

robertlipe commented 4 months ago

Hey, Sean.

If there's no hardware (beyond a compatible board and I think I hae spare board that are "compatible enough" with the Lilygo S3 that I see) and no network configuration needed, I may be able to help.

I've confirmed the most recent checkin on that branch is commit 1662c5704df3dfe84b65cb6f12bf5e9a30369329 (HEAD, origin/testing) Author: Sean Date: Wed Feb 28 11:19:09 2024 -0500

fixed one crash just to have more D:

And I've confirmed that a pio run -e lilygo-t-display-s3 will hock up an image: ... Successfully created esp32s3 image. ============================= [SUCCESS] Took 30.82 seconds

Environment Status Duration


lilygo-t-display-s3 SUCCESS 00:00:30.823

I'm not looking for hardware at this hour, but can you describe the steps needed (beyond the "obvious" mechanics of uploading, running, attaching a monitor, etc.) needed to confirm and observe a problem?

There's not an S3 on my nightstand tonight, but if I were to plug one up, what's the minimum that I can do to see whatever it is that's stumping you? Answers might include "just run it. Watch it crash". "I expected it to print X and it prints Y", etc.

On Wed, Feb 28, 2024 at 10:37 AM SeanMcKeen @.***> wrote:

After weeks of brain damage from attempting to rework and fix the code the best I could. I still can't seem to find the error. Even without SNMP even being used, the lights crash and don't reset the esp. No clue what this could be, I was assuming memory write errors but now I have no clue.

I uploaded it to the testing branch once again hoping someone else can see what I'm doing wrong. I'm at the point where I can't seem to find a solution no matter how much research and head banging I do.

— Reply to this email directly, view it on GitHub https://github.com/SeanMcKeen/IT-Lablights/issues/2#issuecomment-1969385829, or unsubscribe https://github.com/notifications/unsubscribe-auth/ACCSD32UDGMPPGV2JAVBX6LYV5MNVAVCNFSM6AAAAABDCUB6ROVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNRZGM4DKOBSHE . You are receiving this because you authored the thread.Message ID: @.***>

SeanMcKeen commented 4 months ago

I believe that MOSTLY everything that needs to be configured should be in either globals.h or secrets.h. Other than that since you got it to work with the lilygo env, I THINK it should be fine to work just like that.

On to what it's not doing that I need it to do is just the crashing. I seem to be able to get it to poll the device and work fine for around 10 seconds, and then all of a sudden it just dies. It stops printing, the leds stop updating, dead.

USUALLY, when this happens I've been getting a guru meditation error on core 1, looking it up online said it was a memory issue. I checked the free heap size and it seems perfectly fine so I keep thinking there's a memory leak.

However, I tried completely removing the necessity to have snmp data polling in there and just give my program a fixed number to work with (500,000). This STILL causes the program to crash and I have no clue why or where as I have it using dynamic memory so the memory addresses tell me nothing.

robertlipe commented 4 months ago

Before I get a chance to look into this myself, perhaps a review of debugging techniques may be in order.

You said you're on an S3 board. If you're on a board with two USB ports and you're using the one that isn't labeleled UART - or you're using a board with only one USB port - you actually have access to a debugger https://docs.platformio.org/en/latest/plus/debugging.html#piodebug that requires no extra hardware. (The same basic techniques apply on the older boards, but at the added overhead of attaching a JTAG probe which involves extra cables, connections, boards, and generally, grief. It's highly likely to be worth the overhead (it'll take some time to get all the drivers that you may need as well as becoming familiar with yet another set of tools, but the ability to have it tell you where the program crashed, be able to print variables, set breakpoints on specific lines of code so the program can stop, be inspected, and perhaps automatically restarted will probably pay back that setup time many times over during the life of the project - and/or your own life, whichever comes first. :-)

There is another bit of telemetry that's much easier to set up, but measurably less helpful. It can tell you with certainty the function that's running when it crashes and with a little extra work, the line that's running at the crash. You mentioned that you're getting a guru meditation error (which is such a dumb name for it, but here here we are...). That's surrfounded by either a bunch of next numbers OR source lines, function arguments, of where the crash was, what line called that, what line called THAT, and so on. I'm guessing you're not seeing the latter.

If the lines around that message look approxiomately like the first message in https://github.com/platformio/platform-espressif32/issues/393, then it's highly lilkely the solution there will work, too.

Adding build_type = debug and monitor_filters = esp32_exception_decoder to your platformio.ini may bear fruit very quickly.

It can change (copy-pasting from different sources)

Backtrace: 0x4008e18d:0x3ffaf880 0x4008e539:0x3ffaf8a0 0x4009384b:0x3ffaf8c0 0x40086869:0x3ffaf8e0 0x400868a8:0x3ffaf900 0x40096bad:0x3ffaf920 0x40085fc1:0x3ffaf940 0x4014e149:0x3ffaf960 0x40157e1d:0x3ffaf980 0x40154f41:0x3ffaf9a0 0x400fe3f2:0x3ffaf9c0 0x400fe467:0x3ffaf9e0 0x4008fb8d:0x3ffafa00

Thosse are program counter/stack pointer pairs. Decoding them by hand is possible, but it's not very much fun. Letting the exception decoder do that for you is the way to go. It might convert that to (it won't, since this is from a different example) into

Backtrace: 0x40081656:0x3ffb4ac0 0x40085729:0x3ffb4ae0 0x4008a7ce:0x3ffb4b00 0x40007c69:0x3ffb4b70 0x40008148:0x3ffb4b90 0x400d51d7:0x3ffb4c20 0x400e31bc:0x3ffb4c50 0x40087bc5:0x3ffb4c80 0x40081656: panic_abort at /Users/espressif/esp-idf/components/esp_system/panic.c:452 0x40085729: esp_system_abort at /Users/espressif/esp-idf/components/esp_system/port/esp_system_chip.c:90 0x4008a7ce: abort at /Users/espressif/esp-idf/components/newlib/abort.c:38 0x40007c69: ets_write_char in ROM 0x40008148: ets_printf in ROM 0x400d51d7: app_main at /Users/espressif/esp-idf/examples/get-started/hello_world/main/hello_world_main.c:49 0x400e31bc: main_task at /Users/espressif/esp-idf/components/freertos/app_startup.c:208 (discriminator 13) 0x40087bc5: vPortTaskWrapper at /Users/espressif/esp-idf/components/freertos/FreeRTOS-Kernel/portable/xtensa/port.c:162 Once you know that stacks work upside down (like a stack of plates in the pantry...) here you can see that the actual crash was in panic(). By itself thats kind of self-evident, but that panic was called by abort. That abort was in a ROM function for write_char (a clue!) which was calledfrom the ROM printf(a better clue!) which was called from line 49 of hello_world_main.c. (smoking gun - with DNA and fingerprints and a signed confession before witnesses)!

That line might look, for example, likemain() { char *s = nullptr; printf("something %s something", s);...which crashed because dereferencing a nullptr is undefined behaviour. Sometimes it prints (null) and sometimes it just crashes and sometimes it just doesn't care and keeps running. But you KNOW it's on that line with the printf which is a MUCH smaller search domain than "my program acts funny sometimes". I think that the esp32_exception_decoder is a convenient wrapper for https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/tools/idf-monitor.html. That's a slightly different set of tools that PlatformIO, so don't dwell on the details. If the keys dont' work, for example, it's because it's actually a different program (I don't remember for certain) but you can get a bit of a deeper insight into how those tools work at https://docs.espressif.com/projects/esp-idf/en/latest/esp32s3/api-guides/tools/idf-monitor.html. Don't get sucked into the horror of custom_reset_sequences and CONFIG_ESP_SYSTEM_YADDA all that; let PlatformIO hide all of that from you.

Don't get all wrapped up in filtering rules and such. Let Platformio launch the GDB stub for you and so on. Treat that page more of a hint on how the esp32_exception_monitor PROBABLY works (ultimately by calling esp32-blablah-addr2line for the PC (program counter) in that crash and then for the "PC" half of all the PC/SP (stack pointer) pairs written in that crash.

The two number from the raw crash that are most use in isolation are the PC (program counter) - that's the opcode that was running when it crashed and EXC_VADDR (exception virtual (value?) address) that's the value that was being read or written from when it crashed. So if the PC is 0x80010000 that was address of the opcode that was running and if EXCVADDR was 4 then you were probably doing something like a load or store from the second word of a structure that you were dereferencing through a null pointer ("0" for any computer you're likely to ever touch) for example.

A trick I do all the time when I KNOW a crash I have saved is from the currently running binary is to let GDB do the work of addr2line for me in a slightly more civilized way and I don't even have to hook the actual board up. (Contrived example.) Let's say I had a crash at 0x400e86dc.

$ xtensa-esp32s3-elf-gdb ./.pio/build/demo/firmware.elf

(gdb) info line *0x400e86dc Line 69 of "~/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp" starts at address 0x400e86d7 <app_main()+3> and ends at 0x400e86e0 <app_main()+12>. (gdb) l app_main 51 if (serialEventRun) serialEventRun(); 52 } 53 } 54 55 extern "C" void app_main() 56 { 57 #if ARDUINO_USB_CDC_ON_BOOT && !ARDUINO_USB_MODE 58 Serial.begin(); 59 #endif 60 #if ARDUINO_USB_MSC_ON_BOOT && !ARDUINO_USB_MODE (gdb) 61 MSC_Update.begin(); 62 #endif 63 #if ARDUINO_USB_DFU_ON_BOOT && !ARDUINO_USB_MODE 64 USB.enableDFU(); 65 #endif 66 #if ARDUINO_USB_ON_BOOT && !ARDUINO_USB_MODE 67 USB.begin(); 68 #endif 69 loopTaskWDTEnabled = false; 70 initArduino(); (gdb) disassemble app_main Dump of assembler code for function app_main(): 0x400e86d4 <+0>: entry a1, 48 0x400e86d7 <+3>: l32r a8, 0x400d0fd0 <_stext+4016> (0x3ffc3d6c

) 0x400e86da <+6>: movi a9, 0 0x400e86dd <+9>: s8i a9, a8, 0 So in my (contrived) example, I now know that exactly none of that #if jibberish was even compiled in and it crashed loading the address of loopTaskWDTEnabled into a register. (And this is how I know my example is bogus because that wouldn't actually crash...). it only then loads a9 with a zero and stores that zero (which is in register a9) into that pointee at +9. So my example is clearly bogus, but it shows you that you can use GDB directly on the .elf file even without the device attached and let it do the equivalent of the 'addr2line' for you. In fact, let's try that: $ xtensa-esp32s3-elf-addr2line -e ./.pio/build/demo/firmware.elf -a 0x400e86dc 0x400e86dc ~/.platformio/packages/framework-arduinoespressif32/cores/esp32/main.cpp:69 Hey, we got the same answer! It's almost like there's some science in computer science. (Sometimes...) This is a somewhat long "teach Sean to fish" answer before I get a chance to join you at the fishing pond later this week. (I hope.) Hopefully you don't have to resort to tools quite as crude as these and that just adding those two lines to platformio.ini will show you that you're making an access off the end of an array (whether an array or a malloc'ed item for size N when you accessed size > N...or < 0!) or through a pointer that's been freed or something. Sidebar: In PlatformIO, indeed in many embedded configurations, foo.elf is HUGE as it has all the debugging information in it. Another process (strip) comes along and nukes all that stuff that humans read and computer don't and scribbles a corresponding foo.bin that is what's actually shipped to the board. $ du -h ./.pio/build/demo/firmware* 964K ./.pio/build/demo/firmware.bin 32M ./.pio/build/demo/firmware.elf On a Real Computer where that information isn't even pulled into memory, it doesn't matter too much, but you can see that if you're shipping a firmware update over a cell modem, a wifi connection, or even a USB connection, a 98% size reduction is worth it. Debugging information for a C++ program is often way more than 100x larger than the program itself. Hope this gets you a little further down the road until I (or someone else) can help further.