Open GoogleCodeExporter opened 9 years ago
Just for comparison, the marco-subclass-crazy HardwareSerial compiles the entire
"Serial.begin(9600);" command to:
2a4: 10 92 5c 01 sts 0x015C, r1 ; reset buffer head
2a8: 10 92 5d 01 sts 0x015D, r1 ; reset buffer tail
2ac: 10 92 c0 00 sts 0x00C0, r1 ; disable u2x mode
2b0: 10 92 c5 00 sts 0x00C5, r1 ; set high baud bits
2b4: 87 e6 ldi r24, 0x67
2b6: 80 93 c4 00 sts 0x00C4, r24 ; set low baud bits
2ba: 88 e9 ldi r24, 0x98
2bc: 80 93 c1 00 sts 0x00C1, r24 ; set pins and ISR
2c0: 86 e0 ldi r24, 0x06
2c2: 80 93 c2 00 sts 0x00C2, r24 ; set frame format
The 2ac, 2c0, and 2c2 lines are only there because this version handles serial
frame
formats and u2x mode. The buffer head/tail reset lines are also extra, without
an
end() function these could be pre-instantiated to zero. The Arduino-16
HardwareSerial
doesn't handle frame formats, u2x mode, or end(). So, the functionality of the
Arduino-16 "Serial.begin(9600);" should be only 5 instructions long... instead
of the
138 it actually is (before you start looking at loops).
The Arduino-16 version of "Serial.begin(9600);" compiles to:
26c: 40 e8 ldi r20, 0x80 ; 128
26e: 55 e2 ldi r21, 0x25 ; 37
270: 60 e0 ldi r22, 0x00 ; 0
272: 70 e0 ldi r23, 0x00 ; 0
274: 0e 94 b8 01 call 0x370 ; 0x370
....
370: af 92 push r10
372: bf 92 push r11
374: cf 92 push r12
376: df 92 push r13
378: ef 92 push r14
37a: ff 92 push r15
37c: 0f 93 push r16
37e: 1f 93 push r17
380: cf 93 push r28
382: df 93 push r29
384: 6c 01 movw r12, r24
386: 7a 01 movw r14, r20
388: 8b 01 movw r16, r22
38a: dc 01 movw r26, r24
38c: 14 96 adiw r26, 0x04 ; 4
38e: ad 90 ld r10, X+
390: bc 90 ld r11, X
392: 15 97 sbiw r26, 0x05 ; 5
394: cb 01 movw r24, r22
396: ba 01 movw r22, r20
398: 22 e0 ldi r18, 0x02 ; 2
39a: 30 e0 ldi r19, 0x00 ; 0
39c: 40 e0 ldi r20, 0x00 ; 0
39e: 50 e0 ldi r21, 0x00 ; 0
3a0: 0e 94 4c 03 call 0x698 ; 0x698 <__divmodsi4>
3a4: 20 5c subi r18, 0xC0 ; 192
3a6: 3d 4b sbci r19, 0xBD ; 189
3a8: 40 4f sbci r20, 0xF0 ; 240
3aa: 5f 4f sbci r21, 0xFF ; 255
3ac: ca 01 movw r24, r20
3ae: b9 01 movw r22, r18
3b0: a8 01 movw r20, r16
3b2: 97 01 movw r18, r14
3b4: 0e 94 4c 03 call 0x698 ; 0x698 <__divmodsi4>
3b8: c9 01 movw r24, r18
3ba: da 01 movw r26, r20
3bc: 01 97 sbiw r24, 0x01 ; 1
3be: a1 09 sbc r26, r1
3c0: b1 09 sbc r27, r1
3c2: 29 2f mov r18, r25
3c4: 3a 2f mov r19, r26
3c6: 4b 2f mov r20, r27
3c8: 55 27 eor r21, r21
3ca: 47 fd sbrc r20, 7
3cc: 5a 95 dec r21
3ce: 01 96 adiw r24, 0x01 ; 1
3d0: a1 1d adc r26, r1
3d2: b1 1d adc r27, r1
3d4: e5 01 movw r28, r10
3d6: 28 83 st Y, r18
3d8: e6 01 movw r28, r12
3da: ee 81 ldd r30, Y+6 ; 0x06
3dc: ff 81 ldd r31, Y+7 ; 0x07
3de: 81 50 subi r24, 0x01 ; 1
3e0: 80 83 st Z, r24
3e2: e8 85 ldd r30, Y+8 ; 0x08
3e4: f9 85 ldd r31, Y+9 ; 0x09
3e6: 20 81 ld r18, Z
3e8: 41 e0 ldi r20, 0x01 ; 1
3ea: 50 e0 ldi r21, 0x00 ; 0
3ec: ca 01 movw r24, r20
3ee: 0a 88 ldd r0, Y+18 ; 0x12
3f0: 02 c0 rjmp .+4 ; 0x3f6
3f2: 88 0f add r24, r24
3f4: 99 1f adc r25, r25
3f6: 0a 94 dec r0
3f8: e2 f7 brpl .-8 ; 0x3f2
3fa: 80 95 com r24
3fc: 82 23 and r24, r18
3fe: 80 83 st Z, r24
400: ea 85 ldd r30, Y+10 ; 0x0a
402: fb 85 ldd r31, Y+11 ; 0x0b
404: 20 81 ld r18, Z
406: ca 01 movw r24, r20
408: 0e 84 ldd r0, Y+14 ; 0x0e
40a: 02 c0 rjmp .+4 ; 0x410
40c: 88 0f add r24, r24
40e: 99 1f adc r25, r25
410: 0a 94 dec r0
412: e2 f7 brpl .-8 ; 0x40c
414: 28 2b or r18, r24
416: 20 83 st Z, r18
418: ea 85 ldd r30, Y+10 ; 0x0a
41a: fb 85 ldd r31, Y+11 ; 0x0b
41c: 20 81 ld r18, Z
41e: ca 01 movw r24, r20
420: 0f 84 ldd r0, Y+15 ; 0x0f
422: 02 c0 rjmp .+4 ; 0x428
424: 88 0f add r24, r24
426: 99 1f adc r25, r25
428: 0a 94 dec r0
42a: e2 f7 brpl .-8 ; 0x424
42c: 28 2b or r18, r24
42e: 20 83 st Z, r18
430: ea 85 ldd r30, Y+10 ; 0x0a
432: fb 85 ldd r31, Y+11 ; 0x0b
434: 80 81 ld r24, Z
436: 08 88 ldd r0, Y+16 ; 0x10
438: 02 c0 rjmp .+4 ; 0x43e
43a: 44 0f add r20, r20
43c: 55 1f adc r21, r21
43e: 0a 94 dec r0
440: e2 f7 brpl .-8 ; 0x43a
442: 84 2b or r24, r20
444: 80 83 st Z, r24
446: df 91 pop r29
448: cf 91 pop r28
44a: 1f 91 pop r17
44c: 0f 91 pop r16
44e: ff 90 pop r15
450: ef 90 pop r14
452: df 90 pop r13
454: cf 90 pop r12
456: bf 90 pop r11
458: af 90 pop r10
45a: 08 95 ret
...
698: 97 fb bst r25, 7
69a: 09 2e mov r0, r25
69c: 05 26 eor r0, r21
69e: 0e d0 rcall .+28 ; 0x6bc
6a0: 57 fd sbrc r21, 7
6a2: 04 d0 rcall .+8 ; 0x6ac
6a4: 28 d0 rcall .+80 ; 0x6f6
6a6: 0a d0 rcall .+20 ; 0x6bc
6a8: 00 1c adc r0, r0
6aa: 38 f4 brcc .+14 ; 0x6ba
6ac: 50 95 com r21
6ae: 40 95 com r20
6b0: 30 95 com r19
6b2: 21 95 neg r18
6b4: 3f 4f sbci r19, 0xFF ; 255
6b6: 4f 4f sbci r20, 0xFF ; 255
6b8: 5f 4f sbci r21, 0xFF ; 255
6ba: 08 95 ret
Original comment by gabebear@gmail.com
on 13 Jul 2009 at 5:26
The begin() function is a bad example of the non-const problem since the
inlining
makes such a HUGE difference between the versions.
All of these use the same basic code:
uint8_t x=1;
Serial.write(x);
...
void HardwareSerial::write(uint8_t c) {
while (!(*_ucsra & (1 << _udre)));
_udr = c;
}
The macro subclass version is:
2c6: 80 91 c0 00 lds r24, 0x00C0 ; get UCSRA0
2ca: 85 ff sbrs r24, 5 ; if UDRE0, skip next
2cc: fc cf rjmp .-8 ; goto 0x2c6
2ce: 81 e0 ldi r24, 0x01
2d0: 80 93 c6 00 sts 0x00C6, r24 ; UDR0 = 1
The regular Arduino-16 version:
27e: c8 01 movw r24, r16
280: 61 e0 ldi r22, 0x01 ; 1
282: 0e 94 73 02 call 0x4e6 ; 0x4e6
...
4e6: fc 01 movw r30, r24
4e8: a0 85 ldd r26, Z+8 ; 0x08
4ea: b1 85 ldd r27, Z+9 ; 0x09
4ec: 21 89 ldd r18, Z+17 ; 0x11
4ee: 8c 91 ld r24, X
4f0: 90 e0 ldi r25, 0x00 ; 0
4f2: 02 2e mov r0, r18
4f4: 02 c0 rjmp .+4 ; 0x4fa
4f6: 95 95 asr r25
4f8: 87 95 ror r24
4fa: 0a 94 dec r0
4fc: e2 f7 brpl .-8 ; 0x4f6
4fe: 80 ff sbrs r24, 0
500: f6 cf rjmp .-20 ; 0x4ee
502: 04 84 ldd r0, Z+12 ; 0x0c
504: f5 85 ldd r31, Z+13 ; 0x0d
506: e0 2d mov r30, r0
508: 60 83 st Z, r22
50a: 08 95 ret
If I inline the Arduino-16 version:
278: e0 91 a8 01 lds r30, 0x01A8
27c: f0 91 a9 01 lds r31, 0x01A9
280: 20 91 b1 01 lds r18, 0x01B1
284: 80 81 ld r24, Z
286: 90 e0 ldi r25, 0x00 ; 0
288: 02 2e mov r0, r18
28a: 02 c0 rjmp .+4 ; 0x290 <setup+0x28>
28c: 95 95 asr r25
28e: 87 95 ror r24
290: 0a 94 dec r0
292: e2 f7 brpl .-8 ; 0x28c <setup+0x24>
294: 80 ff sbrs r24, 0
296: f6 cf rjmp .-20 ; 0x284 <setup+0x1c>
298: e0 91 ac 01 lds r30, 0x01AC
29c: f0 91 ad 01 lds r31, 0x01AD
2a0: 81 e0 ldi r24, 0x01 ; 1
2a2: 80 83 st Z, r24
If you look at the "udr = c" part you clearly see the problem. Both the
Arduino-16
versions load main-memory addresses that contain the address of UDR0. This
involves
the 16bit registers X(r26:27) and Z(r30:r31), which is extremely inefficient.
Original comment by gabebear@gmail.com
on 14 Jul 2009 at 2:57
Can you summarize the advantages to these changes? Which functions speed up?
By how much? How much
program space is saved?
Original comment by dmel...@gmail.com
on 14 Jul 2009 at 10:01
For the version I posted originally:
begin() = ~30 bytes w/ inlining (~10x smaller, ~20x ? faster)
write() = ~6 bytes (~6x smaller, ~6x faster)
read() = ~42 bytes (~no change)
-----
I'm posting a rough version of HardwareSerial I'm working on. I'm working on
moving
the memory allocation and ISR definitions into the main compilation unit, which
could
speed up inlined read() functions. This would open up other options as well,
like
nearly free frame-format error logging and an efficient way to allow async
writes.
The only big difference to users is you have to put
SERIAL_ENABLE;
at the top of the sketch file to enable the serial port. There are also
SERIAL1_ENABLE;
SERIAL2_ENABLE;
SERIAL3_ENABLE;
for the Mega. You could just enable Serial2, which wouldn't allocate buffers,
define
ISRs, or generate classes for the other serial ports on the Mega. I don't have a
Mega, so I can't verify that it actually works, but it compiles.
Original comment by gabebear@gmail.com
on 14 Jul 2009 at 11:07
Attachments:
Hmm.. is it really worth optimizing this? The begin() function is usually
called only once a sketch. The write()
function needs to wait for the previous byte to send, so I wonder how often the
speedup would appear in
practice. I'm not sure it's worth the modifications to save 80 bytes or so.
Also, it's not realistic to require a line
like SERIAL_ENABLE; at the top of a sketch: it's unnecessary and breaks
backwards compatibility. Are there other
benefits I should be considering?
Original comment by dmel...@gmail.com
on 14 Jul 2009 at 11:35
The non-constness bloats every Serial function that uses the constructor
parameters.
The program below shrinks from 1812 bytes down to 1363 bytes(449 bytes saved).
void setup() {
Serial.begin(9600);
uint8_t x=1;
Serial.write(x);
}
void loop() {
uint8_t incomingByte;
if (Serial.available() > 0) {
incomingByte = Serial.read();
Serial.print("I received: ");
Serial.println(incomingByte, DEC);
}
}
If you run the serial port at full speed(2Mbaud for 16Mhz), it only takes 80 CPU
cycles to clear the UDR register. If you space your write() calls out
judiciously and
run at full speed you can get rid of most of the waits.
Original comment by gabebear@gmail.com
on 15 Jul 2009 at 2:05
It's actually somewhat worse than this, since the constants that the classes
differ by are (usually?) identical. I experimented some with a version of
HardwareSerial.cpp that uses a single base address and a stucture for the uart:
typedef struct UART_struct
{
volatile uint8_t UCSRA; /* Control Register A */
uint8_t UCSRB; /* Control Register B */
uint8_t UCSRC; /* Control Register C */
uint8_t reserved_0x02;
uint8_t UBRRL; /* Baud Rate Control Register A */
uint8_t UBRRH; /* Baud Rate Control Register B */
volatile uint8_t UDR; /* Data Register */
} UART_t;
with rather favorable reductions in code and ram size. But I need to figure
out how to make sure the device in question has registers that match the
structure (not true for mega8, for example.)
Original comment by wes...@gmail.com
on 9 Nov 2010 at 6:58
Original issue reported on code.google.com by
gabebear@gmail.com
on 13 Jul 2009 at 2:41Attachments: