SpenceKonde / megaTinyCore

Arduino core for the tinyAVR 0/1/2-series - Ones's digit 2,4,5,7 (pincount, 8,14,20,24), tens digit 0, 1, or 2 (featureset), preceded by flash in kb. Library maintainers: porting help available!
Other
562 stars 146 forks source link

Confused about SRAM and Flash usage between ATtiny13a and ATtiny212 #415

Closed foxalabs closed 3 years ago

foxalabs commented 3 years ago

The following code

void setup() {
  Serial.begin(9600);
}
void loop() {
  Serial.print('A');
}

when built for the ATtiny13a uses 0 bytes of SRAM and 88 bytes of Flash

    RAM:   [          ]   0.0% (used 0 bytes from 64 bytes)
    Flash: [=         ]   8.6% (used 88 bytes from 1024 bytes)

when building for the ATtiny212 which has a hardware UART so one would assume the code would be somewhat smaller uses 69 bytes of SRAM and 1754 bytes of Flash.

    RAM:   [=====     ]  53.9% (used 69 bytes from 128 bytes)
    Flash: [========= ]  85.6% (used 1754 bytes from 2048 bytes)

Simple things like digitalWrite are taking 90 to 100+bytes each, EEPROM.put is again 90+ bytes a 20 line program to read EEPROM and ADC values is over 2.5k in size when on the ATtiny13a it's 706 bytes.

Am I missing some compiler directive or build option?

maxgerhardt commented 3 years ago

Addendum: PlatformIO currently uses the 2.1.5 version of this core, the latest version 2.3.1 doesn't compile at all

avr/bin/ld.exe: region `text’ overflowed by 124 bytes

Crosspost in PlatformIO forums here.

SpenceKonde commented 3 years ago

I cannot reproduce your finding of more flash usage on the current version of megaTinyCore - I see 1660b of flash with current development version and 69 bytes sram.. I think 2.1.5 was from before I rewrote large portions of the serial code in part to shave a few hundred bytes from the compiled size. so I don't know why it's coming out larger for you! I'm no expert on platform IO so I can't comment on that - it was @MCUdude who made this work with platform IO.

And the reason that Serial on this core takes so much more flash ( is that this is a full feature serial implementation. that is fully compatible with the one on official Arduino boards. A few versions ago I put a considerable amount of effort into reducing flash usage of the Serial functionality itself, but it is still a huge flash hog, because the virtual functions get compiled in always, with no attempt made to determine if the function is ever actually called. Even with LTO. It's stupid AF. The serial you were iusing on the tiny13 was likely a very tightly optimized one designed only for space and which would have supported only the most basic functions. Even after I worked over some parts of Serial, I am fully aware there remains optimization and cleanup that could be done there. The virtual function shit is incredibly annoying and I have no idea how to "fix" this so the methods don't need to be declared virtual.

The reason it is using ram when the t13's serial doesn't is that on the tiny13 that is a blocking half-duplex software serial implementation. Hence it needs neither a transmit buffer (execution of other code just stops until it's fully sent instead of sending in the background). And if you don;t set it up to receive, I don't think it pulls in anything related to that. as I said above, I am incredibly frustrated by this, I don;t understand why those member functions have to be virtual and what, if anything, I've tried to look it up to understand it but none of the discussions seem to get to the level I need of "Okasy, so that's what a virtual method means. I have a class that has virtual methods because I maintain a library someone in the past wrote. It runs on microcontrollers with as little as 2k of program space, so it is imperative that optimizations like not including unused functions happen. What do I need to do to so that these don't need to be virtual?" And I still despite hours of researching this intensely frustrating subject, do not have a crisp understanding of what 'virtual' qualifier is needed for - like I see examples of the bad behavior it prevents, but I can't understand why that behavior would happen without it.... I like C, and I like inline assembly I am not so good with C++

I do wonder if it is possible for me to, on the 0/1-series parts with only one USART, at least. use #ifdefs to substitute in an alternate more efficient implementation that didn;t need to support multiple instances of the USART. The DRE interrupt in particular isagonizing to look at:

<snip>\megatinycore/UART0.cpp:48
#else
#error "Don't know what the Data Received interrupt vector is called for Serial"
#endif

#if defined(HWSERIAL0_DRE_VECTOR)
ISR(HWSERIAL0_DRE_VECTOR) {
 554: 1f 92         push  r1
 556: 0f 92         push  r0
 558: 0f b6         in  r0, 0x3f  ; 63
 55a: 0f 92         push  r0
 55c: 11 24         eor r1, r1
 55e: 2f 93         push  r18
 560: 3f 93         push  r19
 562: 4f 93         push  r20
 564: 5f 93         push  r21
 566: 6f 93         push  r22
 568: 7f 93         push  r23
 56a: 8f 93         push  r24
 56c: 9f 93         push  r25
 56e: af 93         push  r26
 570: bf 93         push  r27
 572: ef 93         push  r30
 574: ff 93         push  r31
<snip>\megatinycore/UART0.cpp:49
  Serial._tx_data_empty_irq();
 576: 8f e9         ldi r24, 0x9F ; 159
 578: 98 e3         ldi r25, 0x38 ; 56
 57a: 0e 94 b2 00   call  0x164 ; 0x164 <UartClass::_tx_data_empty_irq()>
<snip>\megatinycore/UART0.cpp:50
}
 57e: ff 91         pop r31
 580: ef 91         pop r30
 582: bf 91         pop r27
 584: af 91         pop r26
 586: 9f 91         pop r25
 588: 8f 91         pop r24
 58a: 7f 91         pop r23
 58c: 6f 91         pop r22
 58e: 5f 91         pop r21
 590: 4f 91         pop r20
 592: 3f 91         pop r19
 594: 2f 91         pop r18
 596: 0f 90         pop r0
 598: 0f be         out 0x3f, r0  ; 63
 59a: 0f 90         pop r0
 59c: 1f 90         pop r1
 59e: 18 95         reti

000005a0 <__vector_17>:
__vector_17():
<snip>\megatinycore/UART0.cpp:40
// first place.

#if defined(HAVE_HWSERIAL0)

#if defined(HWSERIAL0_RXC_VECTOR)
ISR(HWSERIAL0_RXC_VECTOR) {
 5a0: 1f 92         push  r1
 5a2: 0f 92         push  r0
 5a4: 0f b6         in  r0, 0x3f  ; 63
 5a6: 0f 92         push  r0
 5a8: 11 24         eor r1, r1
 5aa: 2f 93         push  r18
 5ac: 3f 93         push  r19
 5ae: 4f 93         push  r20
 5b0: 5f 93         push  r21
 5b2: 6f 93         push  r22
 5b4: 7f 93         push  r23
 5b6: 8f 93         push  r24
 5b8: 9f 93         push  r25
 5ba: af 93         push  r26
 5bc: bf 93         push  r27
 5be: ef 93         push  r30
 5c0: ff 93         push  r31
<snip>\megatinycore/UART0.cpp:41
  Serial._rx_complete_irq();
 5c2: 8f e9         ldi r24, 0x9F ; 159
 5c4: 98 e3         ldi r25, 0x38 ; 56
 5c6: 0e 94 67 01   call  0x2ce ; 0x2ce <UartClass::_rx_complete_irq()>
<snip>\megatinycore/UART0.cpp:42
}
 5ca: ff 91         pop r31
 5cc: ef 91         pop r30
 5ce: bf 91         pop r27
 5d0: af 91         pop r26
 5d2: 9f 91         pop r25
 5d4: 8f 91         pop r24
 5d6: 7f 91         pop r23
 5d8: 6f 91         pop r22
 5da: 5f 91         pop r21
 5dc: 4f 91         pop r20
 5de: 3f 91         pop r19
 5e0: 2f 91         pop r18
 5e2: 0f 90         pop r0
 5e4: 0f be         out 0x3f, r0  ; 63
 5e6: 0f 90         pop r0
 5e8: 1f 90         pop r1
 5ea: 18 95         reti

And the thing it calls (note, that's compiled for a different part, but


void UartClass::_tx_data_empty_irq(void) {
 164: cf 93         push  r28
 166: df 93         push  r29
 168: fc 01         movw  r30, r24
<snip>\megatinycore/UART.cpp:98
  // Check if tx buffer already empty.
  if (_tx_buffer_head == _tx_buffer_tail) {
 16a: 90 8d         ldd r25, Z+24 ; 0x18
 16c: 81 8d         ldd r24, Z+25 ; 0x19
 16e: c4 85         ldd r28, Z+12 ; 0x0c
 170: d5 85         ldd r29, Z+13 ; 0x0d
 172: 98 13         cpse  r25, r24
 174: 06 c0         rjmp  .+12      ; 0x182 <UartClass::_tx_data_empty_irq()+0x1e>
<snip>\megatinycore/UART.cpp:101
    // Buffer empty, so disable "data register empty" interrupt
    //VPORTA.IN |= 0x80;
    (*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
 176: 8d 81         ldd r24, Y+5  ; 0x05
 178: 8f 7d         andi  r24, 0xDF ; 223
 17a: 8d 83         std Y+5, r24  ; 0x05
<snip>\megatinycore/UART.cpp:123
  if (_tx_buffer_head == _tx_buffer_tail) {
    // Buffer empty, so disable "data register empty" interrupt
    (*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
    //VPORTA.IN |= 0x80;
  }
}
 17c: df 91         pop r29
 17e: cf 91         pop r28
 180: 08 95         ret
<snip>\megatinycore/UART.cpp:107
    return;
  }

  // There must be more data in the output
  // buffer. Send the next byte
  unsigned char c = _tx_buffer[_tx_buffer_tail];
 182: a1 8d         ldd r26, Z+25 ; 0x19
 184: ae 0f         add r26, r30
 186: bf 2f         mov r27, r31
 188: b1 1d         adc r27, r1
 18a: a5 5a         subi  r26, 0xA5 ; 165
 18c: bf 4f         sbci  r27, 0xFF ; 255
 18e: 9c 91         ld  r25, X
<snip>\megatinycore/UART.cpp:108
  _tx_buffer_tail = (_tx_buffer_tail + 1) & (SERIAL_TX_BUFFER_SIZE-1); //% SERIAL_TX_BUFFER_SIZE;
 190: 81 8d         ldd r24, Z+25 ; 0x19
 192: 8f 5f         subi  r24, 0xFF ; 255
 194: 8f 73         andi  r24, 0x3F ; 63
 196: 81 8f         std Z+25, r24 ; 0x19
<snip>\megatinycore/UART.cpp:113

  // clear the TXCIF flag -- "can be cleared by writing a one to its bit
  // location". This makes sure flush() won't return until the bytes
  // actually got written
  (*_hwserial_module).STATUS = USART_TXCIF_bm;
 198: 80 e4         ldi r24, 0x40 ; 64
 19a: 8c 83         std Y+4, r24  ; 0x04
<snip>\megatinycore/UART.cpp:116
    //VPORTA.IN |= 0x40;

  (*_hwserial_module).TXDATAL = c;
 19c: a4 85         ldd r26, Z+12 ; 0x0c
 19e: b5 85         ldd r27, Z+13 ; 0x0d
 1a0: 12 96         adiw  r26, 0x02 ; 2
 1a2: 9c 93         st  X, r25
<snip>\megatinycore/UART.cpp:118

  if (_tx_buffer_head == _tx_buffer_tail) {
 1a4: 90 8d         ldd r25, Z+24 ; 0x18
 1a6: 81 8d         ldd r24, Z+25 ; 0x19
 1a8: 98 13         cpse  r25, r24
 1aa: e8 cf         rjmp  .-48      ; 0x17c <UartClass::_tx_data_empty_irq()+0x18>
<snip>\megatinycore/UART.cpp:120
    // Buffer empty, so disable "data register empty" interrupt
    (*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
 1ac: 04 84         ldd r0, Z+12  ; 0x0c
 1ae: f5 85         ldd r31, Z+13 ; 0x0d
 1b0: e0 2d         mov r30, r0
 1b2: 85 81         ldd r24, Z+5  ; 0x05
 1b4: 8f 7d         andi  r24, 0xDF ; 223
 1b6: 85 83         std Z+5, r24  ; 0x05
 1b8: e1 cf         rjmp  .-62      ; 0x17c <UartClass::_tx_data_empty_irq()+0x18>

and... when there's more than one USART... EACH ONE GETS THEIR OWN PUSH-POP for like half of all the working registers on the bloody chip. some of them saving and restoring registers that don't even appear to be used! In an ideal world, I wonder if those could be reimplemented as a naked ISR, which just saved the two registers that it needed to passthe address that the actual ISR is stuffing into the z register. then jmp to the actual ISR, which would be declared with signal attribute so the compiler woul treat it like an ISR in terms of it's prologue and epilogue. I wonder if that would be viable. There has go to be some way to make it so each ISR doesn't need to push and pop half the working registers, especially when the function it calls doesn't even seem t need to use them all x_x

This is the official Arduino serial class, or it was, before I saved ~200 bytes of flash amd eliminated a bug that could under stranmge corner cases hang the chip in a halfway and which violated the no-astonishment principle anyway (that is "Don't do things in your library where a simple looking functionality does something that people will be astonished by it's doing. For example, if you're writing a Serial UART class, it shouldn't be configuring the CPUINT peripheral to change which interrupts are proritized how; "). In any event, it was targted at the ATmega4809 with 48k flash and 6k ram; like, they weren't designing it with keeping the flash footprint small at the top of their list. But you release a core with the implementation of serial that you have. not the one you want.

Also, I did compile your sketch for the 212 and generate hex, map, and lst file if you want to see how it comes out when I build it. (that was with nearly-ready-for-release 1.3.2-dev) https://github.com/SpenceKonde/UsefulArduinoPosts/tree/master/notes_from_core_issues/mtc415

The edited map got tidied up with regexes. I need to ask my python dude if he could give me a skeleton of a program that I could add regex substitutions to, and call as part of the sketch export process

Also, yes, I can see that the names of the hex files are broken I swear I fixed that multiple times in the past I don't know why it is busted again.

My closing thought - do keep in mind you are using the supported part with the very least flash. Every time I get a chance to, I tell people that the user experience of working with a 2k part in Arduino when the core hasn't been very aggressively tweaked for that specific part - is pretty lousy.. Most megaTinyCore users are using 16k and 32k parts, and wouldn't stand for my removing all normal serial functionality. While I try to keep a lid on flash usage, and optimize more than Arduino does, as a matter of policy, I do not bend over backwards (or forwards) to accommodate the bottom of the barrel parts if that comes at the expense of the top-end ones.

maxgerhardt commented 3 years ago

I cannot reproduce your finding of more flash usage on the current version of megaTinyCore - I see 1660b of flash with current development version and 69 bytes sram..

Are you sure? The 124 bytes overflow is the result I get when I use the Arduino IDE with the latest version, installed as per README.

grafik

grafik

grafik

SpenceKonde commented 3 years ago

Oh! With optiboot, yeah that would do it lol. We do not recommend using optiboot on any parts with less than 8k of flash, and I should probably mark it as not recommended for 8k parts too... . At 2k you're giving up a quarter of your precious flash on chip that already is too small to use comfortably with arduino... just in order to program with a serial adapter on the serial pins,, instead of programming with.... a serial adapter and a schottky diode (more reliable than the 4.7k resistor method) on the UPDI pins?. Don't use optiboot on a 2k part.

The only reason we support the bootloader on all 2k parts is because it doesn;t require any extra binaries to be built, and when Bill Westfield did his initial port of Optiboot_x, that's what he decided to support. (he didn't notice, either, that the bootloader binaries were identical except for the 8-pin ones with default serial pin; he noticed they were the same for all sizes, but it wasn't until a few months ago that I realized they were identical for the different pincounts other than 8 too., With alt serial pins, all pincounts have the same binary; 3217 uses the same bootloader hex file as the 212!

SpenceKonde commented 3 years ago

I'm going to move this to discussions, as there is no specific defect in the core here - yes, flash usage can always be improved. Maybe I should make a LiteSerial library? with bare minimum of features and smaller flash footprint? But that's a long term thing...

foxalabs commented 3 years ago

Agreed.