SpenceKonde / ATTinyCore

Arduino core for ATtiny 1634, 828, x313, x4, x41, x5, x61, x7 and x8
Other
1.6k stars 310 forks source link

Attiny167 freeze #675

Closed saddys closed 2 years ago

saddys commented 2 years ago

I have a sketch that use for year in a attiny85 without a problem (i use for control 50 ws2812b leds, with code usage: 99% flash 99% ram). I change with 167 for needing of more gpio and flash Memory. With the same code, the program dont start (no leds effect at all). reducing number of leds or reducing some button array (85% ram usage) program starts, when the ram usage are > 85%, sketch dont start. Its a problem of compiler/core?

SpenceKonde commented 2 years ago

Please retest with 2.0.0-dev branch.

99% ram usage should not be expected to work. It shouldn't work on the tiny85 either, honestly, I am shocked that it does. I think the only reason that it does is that whatever memory is being corrupted by the stack-heap collission ends up not causing a visible failure.

The ram usage displayed only includes global and statically allocated variables. It does not include dynamically allocated memory (even if malloc is called only once with a constant size, and free is not called until the destructor runs at the likely unreachable endof the program), and more importantly when you're that close to the limit, it does not include the stack.

Every time a function is called (and isn't automatically inlined), at the very minimum, a return address must be pushed onto the stack (2 bytes). if that function calls another function, that pushes another 2 bytes onto the stack. And any interrupt, at minimum (unless it's defined naked, in which case your can really only use basic assembly with no operands or the very few C constructions that deterministically compile onto assembly that uses no working registers), must not only push the return address, but also. push r0; push r1; in r0, SREG; push r0 eor r1, r1 - for a total of 5 bytes of stack space. That is for a function which does nothing. If a function uses more than 14 bytes of local variables (there are 14 "call used" or "call clobbered" working registers that functions can use freely), or intermediate values (for example. DDRA = 2 compiles to ldi r, 2; out DDRA, r (where r__ represents something between r16 and r31, the registers that it is legal to use LDI with) , or if an ISR uses any local variables or intermediate values at all, it must save those registers by pushing them onto the stack and then popping them off at the end. And if - got help you - you use more than 30 bytes of local variables i in a function, variables in excess of that also have to get pushed onto the stack and popped off when the displaced local variable is needed again...

I'm very surprised that it ends up needing fully 15% of ram for stuff like that, but 1% free RAM is not absolutely not enough regardless of the chip.

Look at the assembly for something like say, the millis interrupt on a classic AVR: (note, in this post I have used "heap" to refer to the dynamically allocated section of memory. Often there is no dynamically allocated memory, in which case the collision I'm talking about is between the stack and the .bss section (this holds global and statically defined variables that are not initialized to non-zero values; I attached an image adapted from the avrlibc manual (the had a bunch of extraneous information on it that was irrelevant here)) that shows how the memory is organized).

malloc-std
3f0:    1f 92           push    r1
 3f2:   0f 92           push    r0
 3f4 :  0f b6           in  r0, 0x3f    ; 63
 3f6 :  0f 92           push    r0
 3f8 :  11 24           eor r1, r1
 3fa :  2f 93           push    r18
 3fc:   3f 93           push    r19
 3fe:   8f 93           push    r24
 400:   9f 93           push    r25
 402:   af 93           push    r26
 404:   bf 93           push    r27
/* Omitted the middle 21 words that actually do the work
 * TL:DR: this is an example of how a common and simple thing in almost any Arduino sketch will blow up if you have no free
 * ram - this shows it pushing 9 bytes onto the stack, plus it needs 2 more for the return address so it knows where to
 * return to.
 * 
 * This just shows the "prologue" where it pushes the registers it needs to make room to run the ISR, and the "epilogue" 
 * where it restores them to their previous values. A large and inefficient function (anything with more than 10 bytes of local 
 * variables or intermediate values) will have a similar prologue and epilogue, tough it will only have to do r2-r17 and r28, r29, 
 * which are the so-called "call-saved" registers. the other registers other than r1 and r0 are "call used" and a function is 
 * permitted to do as it wishes with them.
 * (it needs to do this because an interrupt could happen at any time, and those registers could be in use. So after the 
 * interrupt returns, those registers had damned well better hold the same values. And since it could be in the middle of
 * something that uses the SREG (for example, any mathematical operations or most if() statements which test the bits of
 * the SREG), that needs to be saved and restored. And avr-gcc always keeps 0 in r1, because a lot of operation require a
 * register that is known to be 0 - ex. if you're comparing an 8-bit value to a 16-bit value, how do you do that? cp rA1, rB, cpc
 * rA2, r1. (where rA1 and rA2 are the working registers containing the 16-bit value and rB i the one with the 8-bit one - but
 * wen an interrupt fires, it could be in the middle of one of the rare operations that changes r1 (multiplication operations pt the
 * result into r0 and r1; iirc somewhere there's a tiny snippet of code implemented in assembly calls mul, copies the result out to 
 * wherever it was desired, and clears r1 with `eor r1, r1` (eor is exclusive or)). r0 is the "temp reg" which normally never gets sa
 * saved, except n an ISR, te interrupt cold have happened at one of those times when r0 was being used temporarily, so even 
 * though in inline assembly you can clobber r0 freely, you can't do that in an ISR. 
 * If there is less than 11 bytes between the current top of the stack, and the end of the heap, this will cause a stack-heap
 * collision.. If you had only 5 bytes free, 6 bytes of data would be trashed by this. If you literally had 0 bytes left on the stack,
 * the return address itself would overwrite the heap. If the ISR then wrote to whatever variable was stored there,  at the end of 
 * the interrupt, it would return to a totally different place in the program!
 * overwriting the stack with garbage is almost always how bad code crashes an arduino. 
 * */ 
 466:   bf 91           pop r27
 468:   af 91           pop r26
 46a:   9f 91           pop r25
 46c:           8f 91           pop r24
 46e:   3f 91           pop r19
 470:   2f 91           pop r18
 472:   0f 90           pop r0
 474:   0f be           out 0x3f, r0    ; 63
 476:   0f 90           pop r0
 478:   1f 90           pop r1
 47a:   18 95           reti

Once you exceed that limit, you experience a "stack-heap collision" and the stack (which grows backwards from the end of ram) starts overwriting the end of the (or more likely the .bss and corrupting values in RAM. The compiler is free to organize the heap as it chooses, and will usually do so if the processor is different, especially if the change involves moving from something below the 8k barrier to above the 8k barrier (8k is 4096 words; the rjmp and rcall instructions, single word instructions to jump to a different location in flash or call a function that can be retuned from, use a relative address of -2048 to +2047, and wrap around at the end of flash. Hence, on 8k or smaller parts, there is no jmp or call (which are two word instructions that can address the whole flash). Even when those are not used, the compiler organizes the code very differently

Now off to deal with about $100 of problems that just got dropped in my lap. International customs is a violation of the right to have personal property. Who is some beaurocrat to say say that two willing people cannot exchange currency for legally items and charge them postage all over again because the customs description I've used for years is suddenly insufficient. What would a sufficient description that has to fit on one line be?

saddys commented 2 years ago

Thanks for answer. The attiny85 run perfect without problems. I control a LED strip with 50 effects and various activation and programming buttons. Never a problem. I wanted to move to attiny 1634 (which have double ram and is supported by your library) but with fastled library it doesn't compile, unlike attiny167 which (with your library) compiles and works (with reduced ram usage by global variables). I always program with Arduino as ISP. Thanks for your work!

I want to try the 2.0.0-dev branch but I can't find the zip file of that version to install manually. Thanks

SpenceKonde commented 2 years ago

There is no archive someplace else, you just downoad he curent state for v2.0.0-dev branch: https://github.com/SpenceKonde/ATTinyCore/archive/refs/heads/v2.0.0-dev.zip''

Since you're out of ram, and it tells you, I suppose you're already using the static version.

Fastled does a lousy job of keep up with the most recent AVR releases (1634 isn't even recent!). IIRC one of the two people who maintain that example of kitchen-sinkism passed away a few years back (boating accident or something) which has, in addition to the obvious, had a major adverse impact on their ability to release new version. The fact that the code is incomprehensible to the uninitiated makes him hard to find a replacement for too. When I first had a set of LEDs and realized my cores supported stuff that was compatible with no libraries, I started looking for something to adapt. In the time I had spent unsuccessfully trying to find what I would need to modify on the fastLED, I had found the location of the change and made it to adafruit's library for all clock speeds.

SpenceKonde commented 2 years ago

I am still mystified as to how the t85 could possibly be working with only 1% ram left.

saddys commented 2 years ago

You are very exhaustive, but it was not necessary to write an essay 😅 Unfortunately the attiny do them with little memory, to overcome eventually I will go to change micro (esp series). I will try the 2.0.0, I had not found anywhere the link, thanks.

saddys commented 2 years ago

I am still mystified as to how the t85 could possibly be working with only 1% ram left.

Maybe I got confused with flash memory, tomorrow in the lab I compile and I tell you

SpenceKonde commented 2 years ago

What's the news on this?

since you mentioned te ESP series micros, I feel compelled to say that I say that I consider the two to be largely complimentary in most usecasess. I've usually had an ESP9266 and a modern tinyAVR working together (the tinyAVR is also able to act as a watchdog for the ESP, which was important for me because the code I was using ahh.... maybe wasn't the most robust on the ESP-side (they're not my forte) - certainly not like modern AVRs are), amd product I hope to launch this spring is Wemo D1 Mini shield with an ATtiny 2-series (20-pin, simply for the alt reset - so that they can both reset the other one if they want to). The basic case I spent the most time on as a bridge between 433 MHz Cheapo-RF and intranet accessible web API; iirc neither of them had much of an logic on them, just acting as dumb translators, amd received RF messages would be sent from the tiny to the ESP which would in turn fire off a request at a raspberry pi where some actual logic resided.... ) The problem was the watchdog mechanism though which is instantly solved by the 2-series' alt reset. And level shifting was a pain, but since then I've realized I can just run the USART in open drain mode, and not have to worry about level shifting the TX side , and as I recall RX "just worked" , so then the only thing needed to run the tiny at a civilized 5v is to put a tiny fet between it's I/O pin and ESP pin that cold trigger the reset.

saddys commented 2 years ago

Hi, sorry for the delay. In attiny85 I use 90% ram, 99% was flash. I was confuse. You were right of course. But in the end I created a new pcb and I switched to the atmega168 that remaining cheap, has 1kb of ram and allows me to compile well there fastled library and manage more leds and tomorrow I can switch to atmega328 with the same pinout.

SpenceKonde commented 2 years ago

I suspect you would have had success with the 861 uing tnyNeoPixel_Static. FastLED is one of those libraries like radiohead that has succumbed to scope creep and become an unmaintainable mess. Though you would have had balls to the wall on flash usage, because the more pins a device has, the less flash is availalble for the rest of the applicaton because the Arduino abstraction uses tables in progmem to hold the pins. I was able to halve the size of the tables in 2.0.0, but still... I'd get rid of the tablex except that there's code in the wild that accesses them directly (if they were all accessed through wrapper functions, the pins could be laid out such that you could get the information they store into less flash)

saddys commented 2 years ago

I have a DIY pcb and a sketch with programmed over 100 different effect, speeds, number led, brightness, direction, ecc. I prefer move to a new CPU that support my sketch, revert all the code for neopixel is out of my capabilities.

saddys commented 2 years ago

Anyway with a167 i can use it but have only 512b of ram. Atmega is perfect for my purpose, i have add a display too, that with 512b of ram is impossible

SpenceKonde commented 2 years ago

If that the more economical option, then go for it. I'd suggest the ATtiny1634, a classic AVR with 16k flash and 1k SRAM - but I don't know the status of FastLED support there.

saddys commented 2 years ago

Thanks for answer. The attiny85 run perfect without problems. I control a LED strip with 50 effects and various activation and programming buttons. Never a problem. I wanted to move to attiny 1634 (which have double ram and is supported by your library) but with fastled library it doesn't compile, unlike attiny167 which (with your library) compiles and works (with reduced ram usage by global variables). I always program with Arduino as ISP. Thanks for your work!

I want to try the 2.0.0-dev branch but I can't find the zip file of that version to install manually. Thanks

As I Said, the attiny1634 was perfect like ram and flash memory, but with fast led is not supported. Atmega is supported. Thanks!

SpenceKonde commented 2 years ago

Yeah, it's a real shame the way they have tied their implementation to part-specific code, and then failed to update it for other parts.

The whole concept of what they're doing in this file is Bad and Wrong: https://github.com/FastLED/FastLED/blob/master/src/platforms/avr/fastpin_avr.h

Oh! And that's probably also why you were having issues with the 167.... the library support is specific not to parts, but to the pin mapping used for a specific part! And there are several pin mappings for the 167.

I'll bet if you used the same pinmapping they use in that file, it'd work.