Closed foxalabs closed 3 years ago
Addendum: PlatformIO currently uses the 2.1.5 version of this core, the latest version 2.3.1 doesn't compile at all
avr/bin/ld.exe: region `text’ overflowed by 124 bytes
Crosspost in PlatformIO forums here.
I cannot reproduce your finding of more flash usage on the current version of megaTinyCore - I see 1660b of flash with current development version and 69 bytes sram.. I think 2.1.5 was from before I rewrote large portions of the serial code in part to shave a few hundred bytes from the compiled size. so I don't know why it's coming out larger for you! I'm no expert on platform IO so I can't comment on that - it was @MCUdude who made this work with platform IO.
And the reason that Serial on this core takes so much more flash ( is that this is a full feature serial implementation. that is fully compatible with the one on official Arduino boards. A few versions ago I put a considerable amount of effort into reducing flash usage of the Serial functionality itself, but it is still a huge flash hog, because the virtual functions get compiled in always, with no attempt made to determine if the function is ever actually called. Even with LTO. It's stupid AF. The serial you were iusing on the tiny13 was likely a very tightly optimized one designed only for space and which would have supported only the most basic functions. Even after I worked over some parts of Serial, I am fully aware there remains optimization and cleanup that could be done there. The virtual function shit is incredibly annoying and I have no idea how to "fix" this so the methods don't need to be declared virtual.
The reason it is using ram when the t13's serial doesn't is that on the tiny13 that is a blocking half-duplex software serial implementation. Hence it needs neither a transmit buffer (execution of other code just stops until it's fully sent instead of sending in the background). And if you don;t set it up to receive, I don't think it pulls in anything related to that. as I said above, I am incredibly frustrated by this, I don;t understand why those member functions have to be virtual and what, if anything, I've tried to look it up to understand it but none of the discussions seem to get to the level I need of "Okasy, so that's what a virtual method means. I have a class that has virtual methods because I maintain a library someone in the past wrote. It runs on microcontrollers with as little as 2k of program space, so it is imperative that optimizations like not including unused functions happen. What do I need to do to so that these don't need to be virtual?" And I still despite hours of researching this intensely frustrating subject, do not have a crisp understanding of what 'virtual' qualifier is needed for - like I see examples of the bad behavior it prevents, but I can't understand why that behavior would happen without it.... I like C, and I like inline assembly I am not so good with C++
I do wonder if it is possible for me to, on the 0/1-series parts with only one USART, at least. use #ifdefs to substitute in an alternate more efficient implementation that didn;t need to support multiple instances of the USART. The DRE interrupt in particular isagonizing to look at:
<snip>\megatinycore/UART0.cpp:48
#else
#error "Don't know what the Data Received interrupt vector is called for Serial"
#endif
#if defined(HWSERIAL0_DRE_VECTOR)
ISR(HWSERIAL0_DRE_VECTOR) {
554: 1f 92 push r1
556: 0f 92 push r0
558: 0f b6 in r0, 0x3f ; 63
55a: 0f 92 push r0
55c: 11 24 eor r1, r1
55e: 2f 93 push r18
560: 3f 93 push r19
562: 4f 93 push r20
564: 5f 93 push r21
566: 6f 93 push r22
568: 7f 93 push r23
56a: 8f 93 push r24
56c: 9f 93 push r25
56e: af 93 push r26
570: bf 93 push r27
572: ef 93 push r30
574: ff 93 push r31
<snip>\megatinycore/UART0.cpp:49
Serial._tx_data_empty_irq();
576: 8f e9 ldi r24, 0x9F ; 159
578: 98 e3 ldi r25, 0x38 ; 56
57a: 0e 94 b2 00 call 0x164 ; 0x164 <UartClass::_tx_data_empty_irq()>
<snip>\megatinycore/UART0.cpp:50
}
57e: ff 91 pop r31
580: ef 91 pop r30
582: bf 91 pop r27
584: af 91 pop r26
586: 9f 91 pop r25
588: 8f 91 pop r24
58a: 7f 91 pop r23
58c: 6f 91 pop r22
58e: 5f 91 pop r21
590: 4f 91 pop r20
592: 3f 91 pop r19
594: 2f 91 pop r18
596: 0f 90 pop r0
598: 0f be out 0x3f, r0 ; 63
59a: 0f 90 pop r0
59c: 1f 90 pop r1
59e: 18 95 reti
000005a0 <__vector_17>:
__vector_17():
<snip>\megatinycore/UART0.cpp:40
// first place.
#if defined(HAVE_HWSERIAL0)
#if defined(HWSERIAL0_RXC_VECTOR)
ISR(HWSERIAL0_RXC_VECTOR) {
5a0: 1f 92 push r1
5a2: 0f 92 push r0
5a4: 0f b6 in r0, 0x3f ; 63
5a6: 0f 92 push r0
5a8: 11 24 eor r1, r1
5aa: 2f 93 push r18
5ac: 3f 93 push r19
5ae: 4f 93 push r20
5b0: 5f 93 push r21
5b2: 6f 93 push r22
5b4: 7f 93 push r23
5b6: 8f 93 push r24
5b8: 9f 93 push r25
5ba: af 93 push r26
5bc: bf 93 push r27
5be: ef 93 push r30
5c0: ff 93 push r31
<snip>\megatinycore/UART0.cpp:41
Serial._rx_complete_irq();
5c2: 8f e9 ldi r24, 0x9F ; 159
5c4: 98 e3 ldi r25, 0x38 ; 56
5c6: 0e 94 67 01 call 0x2ce ; 0x2ce <UartClass::_rx_complete_irq()>
<snip>\megatinycore/UART0.cpp:42
}
5ca: ff 91 pop r31
5cc: ef 91 pop r30
5ce: bf 91 pop r27
5d0: af 91 pop r26
5d2: 9f 91 pop r25
5d4: 8f 91 pop r24
5d6: 7f 91 pop r23
5d8: 6f 91 pop r22
5da: 5f 91 pop r21
5dc: 4f 91 pop r20
5de: 3f 91 pop r19
5e0: 2f 91 pop r18
5e2: 0f 90 pop r0
5e4: 0f be out 0x3f, r0 ; 63
5e6: 0f 90 pop r0
5e8: 1f 90 pop r1
5ea: 18 95 reti
And the thing it calls (note, that's compiled for a different part, but
void UartClass::_tx_data_empty_irq(void) {
164: cf 93 push r28
166: df 93 push r29
168: fc 01 movw r30, r24
<snip>\megatinycore/UART.cpp:98
// Check if tx buffer already empty.
if (_tx_buffer_head == _tx_buffer_tail) {
16a: 90 8d ldd r25, Z+24 ; 0x18
16c: 81 8d ldd r24, Z+25 ; 0x19
16e: c4 85 ldd r28, Z+12 ; 0x0c
170: d5 85 ldd r29, Z+13 ; 0x0d
172: 98 13 cpse r25, r24
174: 06 c0 rjmp .+12 ; 0x182 <UartClass::_tx_data_empty_irq()+0x1e>
<snip>\megatinycore/UART.cpp:101
// Buffer empty, so disable "data register empty" interrupt
//VPORTA.IN |= 0x80;
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
176: 8d 81 ldd r24, Y+5 ; 0x05
178: 8f 7d andi r24, 0xDF ; 223
17a: 8d 83 std Y+5, r24 ; 0x05
<snip>\megatinycore/UART.cpp:123
if (_tx_buffer_head == _tx_buffer_tail) {
// Buffer empty, so disable "data register empty" interrupt
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
//VPORTA.IN |= 0x80;
}
}
17c: df 91 pop r29
17e: cf 91 pop r28
180: 08 95 ret
<snip>\megatinycore/UART.cpp:107
return;
}
// There must be more data in the output
// buffer. Send the next byte
unsigned char c = _tx_buffer[_tx_buffer_tail];
182: a1 8d ldd r26, Z+25 ; 0x19
184: ae 0f add r26, r30
186: bf 2f mov r27, r31
188: b1 1d adc r27, r1
18a: a5 5a subi r26, 0xA5 ; 165
18c: bf 4f sbci r27, 0xFF ; 255
18e: 9c 91 ld r25, X
<snip>\megatinycore/UART.cpp:108
_tx_buffer_tail = (_tx_buffer_tail + 1) & (SERIAL_TX_BUFFER_SIZE-1); //% SERIAL_TX_BUFFER_SIZE;
190: 81 8d ldd r24, Z+25 ; 0x19
192: 8f 5f subi r24, 0xFF ; 255
194: 8f 73 andi r24, 0x3F ; 63
196: 81 8f std Z+25, r24 ; 0x19
<snip>\megatinycore/UART.cpp:113
// clear the TXCIF flag -- "can be cleared by writing a one to its bit
// location". This makes sure flush() won't return until the bytes
// actually got written
(*_hwserial_module).STATUS = USART_TXCIF_bm;
198: 80 e4 ldi r24, 0x40 ; 64
19a: 8c 83 std Y+4, r24 ; 0x04
<snip>\megatinycore/UART.cpp:116
//VPORTA.IN |= 0x40;
(*_hwserial_module).TXDATAL = c;
19c: a4 85 ldd r26, Z+12 ; 0x0c
19e: b5 85 ldd r27, Z+13 ; 0x0d
1a0: 12 96 adiw r26, 0x02 ; 2
1a2: 9c 93 st X, r25
<snip>\megatinycore/UART.cpp:118
if (_tx_buffer_head == _tx_buffer_tail) {
1a4: 90 8d ldd r25, Z+24 ; 0x18
1a6: 81 8d ldd r24, Z+25 ; 0x19
1a8: 98 13 cpse r25, r24
1aa: e8 cf rjmp .-48 ; 0x17c <UartClass::_tx_data_empty_irq()+0x18>
<snip>\megatinycore/UART.cpp:120
// Buffer empty, so disable "data register empty" interrupt
(*_hwserial_module).CTRLA &= (~USART_DREIE_bm);
1ac: 04 84 ldd r0, Z+12 ; 0x0c
1ae: f5 85 ldd r31, Z+13 ; 0x0d
1b0: e0 2d mov r30, r0
1b2: 85 81 ldd r24, Z+5 ; 0x05
1b4: 8f 7d andi r24, 0xDF ; 223
1b6: 85 83 std Z+5, r24 ; 0x05
1b8: e1 cf rjmp .-62 ; 0x17c <UartClass::_tx_data_empty_irq()+0x18>
and... when there's more than one USART... EACH ONE GETS THEIR OWN PUSH-POP for like half of all the working registers on the bloody chip. some of them saving and restoring registers that don't even appear to be used! In an ideal world, I wonder if those could be reimplemented as a naked ISR, which just saved the two registers that it needed to passthe address that the actual ISR is stuffing into the z register. then jmp to the actual ISR, which would be declared with signal attribute so the compiler woul treat it like an ISR in terms of it's prologue and epilogue. I wonder if that would be viable. There has go to be some way to make it so each ISR doesn't need to push and pop half the working registers, especially when the function it calls doesn't even seem t need to use them all x_x
This is the official Arduino serial class, or it was, before I saved ~200 bytes of flash amd eliminated a bug that could under stranmge corner cases hang the chip in a halfway and which violated the no-astonishment principle anyway (that is "Don't do things in your library where a simple looking functionality does something that people will be astonished by it's doing. For example, if you're writing a Serial UART class, it shouldn't be configuring the CPUINT peripheral to change which interrupts are proritized how; "). In any event, it was targted at the ATmega4809 with 48k flash and 6k ram; like, they weren't designing it with keeping the flash footprint small at the top of their list. But you release a core with the implementation of serial that you have. not the one you want.
Also, I did compile your sketch for the 212 and generate hex, map, and lst file if you want to see how it comes out when I build it. (that was with nearly-ready-for-release 1.3.2-dev) https://github.com/SpenceKonde/UsefulArduinoPosts/tree/master/notes_from_core_issues/mtc415
The edited map got tidied up with regexes. I need to ask my python dude if he could give me a skeleton of a program that I could add regex substitutions to, and call as part of the sketch export process
Also, yes, I can see that the names of the hex files are broken I swear I fixed that multiple times in the past I don't know why it is busted again.
My closing thought - do keep in mind you are using the supported part with the very least flash. Every time I get a chance to, I tell people that the user experience of working with a 2k part in Arduino when the core hasn't been very aggressively tweaked for that specific part - is pretty lousy.. Most megaTinyCore users are using 16k and 32k parts, and wouldn't stand for my removing all normal serial functionality. While I try to keep a lid on flash usage, and optimize more than Arduino does, as a matter of policy, I do not bend over backwards (or forwards) to accommodate the bottom of the barrel parts if that comes at the expense of the top-end ones.
I cannot reproduce your finding of more flash usage on the current version of megaTinyCore - I see 1660b of flash with current development version and 69 bytes sram..
Are you sure? The 124 bytes overflow is the result I get when I use the Arduino IDE with the latest version, installed as per README.
Oh! With optiboot, yeah that would do it lol. We do not recommend using optiboot on any parts with less than 8k of flash, and I should probably mark it as not recommended for 8k parts too... . At 2k you're giving up a quarter of your precious flash on chip that already is too small to use comfortably with arduino... just in order to program with a serial adapter on the serial pins,, instead of programming with.... a serial adapter and a schottky diode (more reliable than the 4.7k resistor method) on the UPDI pins?. Don't use optiboot on a 2k part.
The only reason we support the bootloader on all 2k parts is because it doesn;t require any extra binaries to be built, and when Bill Westfield did his initial port of Optiboot_x, that's what he decided to support. (he didn't notice, either, that the bootloader binaries were identical except for the 8-pin ones with default serial pin; he noticed they were the same for all sizes, but it wasn't until a few months ago that I realized they were identical for the different pincounts other than 8 too., With alt serial pins, all pincounts have the same binary; 3217 uses the same bootloader hex file as the 212!
I'm going to move this to discussions, as there is no specific defect in the core here - yes, flash usage can always be improved. Maybe I should make a LiteSerial library? with bare minimum of features and smaller flash footprint? But that's a long term thing...
Agreed.
The following code
when built for the ATtiny13a uses 0 bytes of SRAM and 88 bytes of Flash
when building for the ATtiny212 which has a hardware UART so one would assume the code would be somewhat smaller uses 69 bytes of SRAM and 1754 bytes of Flash.
Simple things like digitalWrite are taking 90 to 100+bytes each, EEPROM.put is again 90+ bytes a 20 line program to read EEPROM and ADC values is over 2.5k in size when on the ATtiny13a it's 706 bytes.
Am I missing some compiler directive or build option?