alericardi / arduino

Automatically exported from code.google.com/p/arduino
0 stars 0 forks source link

Each Serial Port really needs its own class #54

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
The compiler isn't able to resolve the class-initializer parameters in
HardwareSerial back to constants. I think the only way to fix this is to
give each serial port its own class that uses the constants for that port...

To this end, I took the HardwareSerial class, subclassed it, then
transformed that subclass into a Macro that manufactures subclasses. This
is a bit insane, but I think it's more maintainable then 5 subclasses that
only differ by constants.

I'm attaching my HardwareSerial class with macro-subclass craziness and
will be posting to the Dev Discussion to see if someone has a better solution.

Look at the instructions being generated for the current HardwareSerial
class... 
        sbi(*_ucsrb, _rxen);
 33a:   ea 85           ldd     r30, Y+10       ; 0x0a
 33c:   fb 85           ldd     r31, Y+11       ; 0x0b
 33e:   20 81           ld      r18, Z
 340:   41 e0           ldi     r20, 0x01       ; 1
 342:   50 e0           ldi     r21, 0x00       ; 0
 344:   ca 01           movw    r24, r20
 346:   0e 84           ldd     r0, Y+14        ; 0x0e
 348:   02 c0           rjmp    .+4             ; 0x34e
<_ZN14HardwareSerial5beginEl+0x18c>
 34a:   88 0f           add     r24, r24
 34c:   99 1f           adc     r25, r25
 34e:   0a 94           dec     r0
 350:   e2 f7           brpl    .-8             ; 0x34a
<_ZN14HardwareSerial5beginEl+0x188>
 352:   28 2b           or      r18, r24
 354:   20 83           st      Z, r18

Original issue reported on code.google.com by gabebear@gmail.com on 13 Jul 2009 at 2:41

Attachments:

GoogleCodeExporter commented 9 years ago
Just for comparison, the marco-subclass-crazy HardwareSerial compiles the entire
"Serial.begin(9600);" command to:

 2a4:   10 92 5c 01     sts     0x015C, r1  ; reset buffer head
 2a8:   10 92 5d 01     sts     0x015D, r1  ; reset buffer tail
 2ac:   10 92 c0 00     sts     0x00C0, r1  ; disable u2x mode
 2b0:   10 92 c5 00     sts     0x00C5, r1  ; set high baud bits
 2b4:   87 e6           ldi     r24, 0x67
 2b6:   80 93 c4 00     sts     0x00C4, r24 ; set low baud bits
 2ba:   88 e9           ldi     r24, 0x98
 2bc:   80 93 c1 00     sts     0x00C1, r24 ; set pins and ISR
 2c0:   86 e0           ldi     r24, 0x06
 2c2:   80 93 c2 00     sts     0x00C2, r24 ; set frame format

The 2ac, 2c0, and 2c2 lines are only there because this version handles serial 
frame
formats and u2x mode. The buffer head/tail reset lines are also extra, without 
an
end() function these could be pre-instantiated to zero. The Arduino-16 
HardwareSerial
doesn't handle frame formats, u2x mode, or end(). So, the functionality of the
Arduino-16 "Serial.begin(9600);" should be only 5 instructions long... instead 
of the
138 it actually is (before you start looking at loops).

The Arduino-16 version of "Serial.begin(9600);" compiles to:

 26c:   40 e8           ldi     r20, 0x80       ; 128
 26e:   55 e2           ldi     r21, 0x25       ; 37
 270:   60 e0           ldi     r22, 0x00       ; 0
 272:   70 e0           ldi     r23, 0x00       ; 0
 274:   0e 94 b8 01     call    0x370   ; 0x370 

....

 370:   af 92           push    r10
 372:   bf 92           push    r11
 374:   cf 92           push    r12
 376:   df 92           push    r13
 378:   ef 92           push    r14
 37a:   ff 92           push    r15
 37c:   0f 93           push    r16
 37e:   1f 93           push    r17
 380:   cf 93           push    r28
 382:   df 93           push    r29
 384:   6c 01           movw    r12, r24
 386:   7a 01           movw    r14, r20
 388:   8b 01           movw    r16, r22
 38a:   dc 01           movw    r26, r24
 38c:   14 96           adiw    r26, 0x04       ; 4
 38e:   ad 90           ld      r10, X+
 390:   bc 90           ld      r11, X
 392:   15 97           sbiw    r26, 0x05       ; 5
 394:   cb 01           movw    r24, r22
 396:   ba 01           movw    r22, r20
 398:   22 e0           ldi     r18, 0x02       ; 2
 39a:   30 e0           ldi     r19, 0x00       ; 0
 39c:   40 e0           ldi     r20, 0x00       ; 0
 39e:   50 e0           ldi     r21, 0x00       ; 0
 3a0:   0e 94 4c 03     call    0x698   ; 0x698 <__divmodsi4>
 3a4:   20 5c           subi    r18, 0xC0       ; 192
 3a6:   3d 4b           sbci    r19, 0xBD       ; 189
 3a8:   40 4f           sbci    r20, 0xF0       ; 240
 3aa:   5f 4f           sbci    r21, 0xFF       ; 255
 3ac:   ca 01           movw    r24, r20
 3ae:   b9 01           movw    r22, r18
 3b0:   a8 01           movw    r20, r16
 3b2:   97 01           movw    r18, r14
 3b4:   0e 94 4c 03     call    0x698   ; 0x698 <__divmodsi4>
 3b8:   c9 01           movw    r24, r18
 3ba:   da 01           movw    r26, r20
 3bc:   01 97           sbiw    r24, 0x01       ; 1
 3be:   a1 09           sbc     r26, r1
 3c0:   b1 09           sbc     r27, r1
 3c2:   29 2f           mov     r18, r25
 3c4:   3a 2f           mov     r19, r26
 3c6:   4b 2f           mov     r20, r27
 3c8:   55 27           eor     r21, r21
 3ca:   47 fd           sbrc    r20, 7
 3cc:   5a 95           dec     r21
 3ce:   01 96           adiw    r24, 0x01       ; 1
 3d0:   a1 1d           adc     r26, r1
 3d2:   b1 1d           adc     r27, r1
 3d4:   e5 01           movw    r28, r10
 3d6:   28 83           st      Y, r18
 3d8:   e6 01           movw    r28, r12
 3da:   ee 81           ldd     r30, Y+6        ; 0x06
 3dc:   ff 81           ldd     r31, Y+7        ; 0x07
 3de:   81 50           subi    r24, 0x01       ; 1
 3e0:   80 83           st      Z, r24
 3e2:   e8 85           ldd     r30, Y+8        ; 0x08
 3e4:   f9 85           ldd     r31, Y+9        ; 0x09
 3e6:   20 81           ld      r18, Z
 3e8:   41 e0           ldi     r20, 0x01       ; 1
 3ea:   50 e0           ldi     r21, 0x00       ; 0
 3ec:   ca 01           movw    r24, r20
 3ee:   0a 88           ldd     r0, Y+18        ; 0x12
 3f0:   02 c0           rjmp    .+4             ; 0x3f6
 3f2:   88 0f           add     r24, r24
 3f4:   99 1f           adc     r25, r25
 3f6:   0a 94           dec     r0
 3f8:   e2 f7           brpl    .-8             ; 0x3f2
 3fa:   80 95           com     r24
 3fc:   82 23           and     r24, r18
 3fe:   80 83           st      Z, r24
 400:   ea 85           ldd     r30, Y+10       ; 0x0a
 402:   fb 85           ldd     r31, Y+11       ; 0x0b
 404:   20 81           ld      r18, Z
 406:   ca 01           movw    r24, r20
 408:   0e 84           ldd     r0, Y+14        ; 0x0e
 40a:   02 c0           rjmp    .+4             ; 0x410
 40c:   88 0f           add     r24, r24
 40e:   99 1f           adc     r25, r25
 410:   0a 94           dec     r0
 412:   e2 f7           brpl    .-8             ; 0x40c
 414:   28 2b           or      r18, r24
 416:   20 83           st      Z, r18
 418:   ea 85           ldd     r30, Y+10       ; 0x0a
 41a:   fb 85           ldd     r31, Y+11       ; 0x0b
 41c:   20 81           ld      r18, Z
 41e:   ca 01           movw    r24, r20
 420:   0f 84           ldd     r0, Y+15        ; 0x0f
 422:   02 c0           rjmp    .+4             ; 0x428
 424:   88 0f           add     r24, r24
 426:   99 1f           adc     r25, r25
 428:   0a 94           dec     r0
 42a:   e2 f7           brpl    .-8             ; 0x424
 42c:   28 2b           or      r18, r24
 42e:   20 83           st      Z, r18
 430:   ea 85           ldd     r30, Y+10       ; 0x0a
 432:   fb 85           ldd     r31, Y+11       ; 0x0b
 434:   80 81           ld      r24, Z
 436:   08 88           ldd     r0, Y+16        ; 0x10
 438:   02 c0           rjmp    .+4             ; 0x43e
 43a:   44 0f           add     r20, r20
 43c:   55 1f           adc     r21, r21
 43e:   0a 94           dec     r0
 440:   e2 f7           brpl    .-8             ; 0x43a
 442:   84 2b           or      r24, r20
 444:   80 83           st      Z, r24
 446:   df 91           pop     r29
 448:   cf 91           pop     r28
 44a:   1f 91           pop     r17
 44c:   0f 91           pop     r16
 44e:   ff 90           pop     r15
 450:   ef 90           pop     r14
 452:   df 90           pop     r13
 454:   cf 90           pop     r12
 456:   bf 90           pop     r11
 458:   af 90           pop     r10
 45a:   08 95           ret

...

 698:   97 fb           bst     r25, 7
 69a:   09 2e           mov     r0, r25
 69c:   05 26           eor     r0, r21
 69e:   0e d0           rcall   .+28            ; 0x6bc
 6a0:   57 fd           sbrc    r21, 7
 6a2:   04 d0           rcall   .+8             ; 0x6ac
 6a4:   28 d0           rcall   .+80            ; 0x6f6
 6a6:   0a d0           rcall   .+20            ; 0x6bc
 6a8:   00 1c           adc     r0, r0
 6aa:   38 f4           brcc    .+14            ; 0x6ba
 6ac:   50 95           com     r21
 6ae:   40 95           com     r20
 6b0:   30 95           com     r19
 6b2:   21 95           neg     r18
 6b4:   3f 4f           sbci    r19, 0xFF       ; 255
 6b6:   4f 4f           sbci    r20, 0xFF       ; 255
 6b8:   5f 4f           sbci    r21, 0xFF       ; 255
 6ba:   08 95           ret

Original comment by gabebear@gmail.com on 13 Jul 2009 at 5:26

GoogleCodeExporter commented 9 years ago
The begin() function is a bad example of the non-const problem since the 
inlining
makes such a HUGE difference between the versions.

All of these use the same basic code:
     uint8_t x=1;
     Serial.write(x);
...
     void HardwareSerial::write(uint8_t c) {
         while (!(*_ucsra & (1 << _udre)));
         _udr = c;
     }

The macro subclass version is:
     2c6:   80 91 c0 00     lds     r24, 0x00C0 ; get UCSRA0
     2ca:   85 ff           sbrs    r24, 5      ; if UDRE0, skip next
     2cc:   fc cf           rjmp    .-8         ; goto 0x2c6 
     2ce:   81 e0           ldi     r24, 0x01
     2d0:   80 93 c6 00     sts     0x00C6, r24 ; UDR0 = 1

The regular Arduino-16 version:
     27e:   c8 01           movw    r24, r16
     280:   61 e0           ldi     r22, 0x01       ; 1
     282:   0e 94 73 02     call    0x4e6   ; 0x4e6 
...
     4e6:   fc 01           movw    r30, r24
     4e8:   a0 85           ldd     r26, Z+8        ; 0x08
     4ea:   b1 85           ldd     r27, Z+9        ; 0x09
     4ec:   21 89           ldd     r18, Z+17       ; 0x11
     4ee:   8c 91           ld      r24, X
     4f0:   90 e0           ldi     r25, 0x00       ; 0
     4f2:   02 2e           mov     r0, r18
     4f4:   02 c0           rjmp    .+4             ; 0x4fa
     4f6:   95 95           asr     r25
     4f8:   87 95           ror     r24
     4fa:   0a 94           dec     r0
     4fc:   e2 f7           brpl    .-8             ; 0x4f6
     4fe:   80 ff           sbrs    r24, 0
     500:   f6 cf           rjmp    .-20            ; 0x4ee
     502:   04 84           ldd     r0, Z+12        ; 0x0c
     504:   f5 85           ldd     r31, Z+13       ; 0x0d
     506:   e0 2d           mov     r30, r0
     508:   60 83           st      Z, r22
     50a:   08 95           ret

If I inline the Arduino-16 version:
     278:   e0 91 a8 01     lds     r30, 0x01A8
     27c:   f0 91 a9 01     lds     r31, 0x01A9
     280:   20 91 b1 01     lds     r18, 0x01B1
     284:   80 81           ld      r24, Z
     286:   90 e0           ldi     r25, 0x00       ; 0
     288:   02 2e           mov     r0, r18
     28a:   02 c0           rjmp    .+4             ; 0x290 <setup+0x28>
     28c:   95 95           asr     r25
     28e:   87 95           ror     r24
     290:   0a 94           dec     r0
     292:   e2 f7           brpl    .-8             ; 0x28c <setup+0x24>
     294:   80 ff           sbrs    r24, 0
     296:   f6 cf           rjmp    .-20            ; 0x284 <setup+0x1c>
     298:   e0 91 ac 01     lds     r30, 0x01AC
     29c:   f0 91 ad 01     lds     r31, 0x01AD
     2a0:   81 e0           ldi     r24, 0x01       ; 1
     2a2:   80 83           st      Z, r24

If you look at the "udr = c" part you clearly see the problem. Both the 
Arduino-16
versions load main-memory addresses that contain the address of UDR0. This 
involves
the 16bit registers X(r26:27) and Z(r30:r31), which is extremely inefficient.

Original comment by gabebear@gmail.com on 14 Jul 2009 at 2:57

GoogleCodeExporter commented 9 years ago
Can you summarize the advantages to these changes?  Which functions speed up?  
By how much?  How much 
program space is saved?

Original comment by dmel...@gmail.com on 14 Jul 2009 at 10:01

GoogleCodeExporter commented 9 years ago
For the version I posted originally:

begin() = ~30 bytes w/ inlining (~10x smaller, ~20x ? faster)
write() = ~6 bytes (~6x smaller, ~6x faster)
read() = ~42 bytes (~no change)

-----

I'm posting a rough version of HardwareSerial I'm working on. I'm working on 
moving
the memory allocation and ISR definitions into the main compilation unit, which 
could
speed up inlined read() functions. This would open up other options as well, 
like
nearly free frame-format error logging and an efficient way to allow async 
writes.

The only big difference to users is you have to put
    SERIAL_ENABLE;
at the top of the sketch file to enable the serial port. There are also
    SERIAL1_ENABLE;
    SERIAL2_ENABLE;
    SERIAL3_ENABLE;
for the Mega. You could just enable Serial2, which wouldn't allocate buffers, 
define
ISRs, or generate classes for the other serial ports on the Mega. I don't have a
Mega, so I can't verify that it actually works, but it compiles.

Original comment by gabebear@gmail.com on 14 Jul 2009 at 11:07

Attachments:

GoogleCodeExporter commented 9 years ago
Hmm..  is it really worth optimizing this?  The begin() function is usually 
called only once a sketch.  The write() 
function needs to wait for the previous byte to send, so I wonder how often the 
speedup would appear in 
practice.  I'm not sure it's worth the modifications to save 80 bytes or so.  
Also, it's not realistic to require a line 
like SERIAL_ENABLE; at the top of a sketch: it's unnecessary and breaks 
backwards compatibility.  Are there other 
benefits I should be considering?

Original comment by dmel...@gmail.com on 14 Jul 2009 at 11:35

GoogleCodeExporter commented 9 years ago
The non-constness bloats every Serial function that uses the constructor 
parameters.
The program below shrinks from 1812 bytes down to 1363 bytes(449 bytes saved).

void setup() {
        Serial.begin(9600);
        uint8_t x=1;
        Serial.write(x);
}

void loop() {
        uint8_t incomingByte;
        if (Serial.available() > 0) {
                incomingByte = Serial.read();
                Serial.print("I received: ");
                Serial.println(incomingByte, DEC);
        }
}

   If you run the serial port at full speed(2Mbaud for 16Mhz), it only takes 80 CPU
cycles to clear the UDR register. If you space your write() calls out 
judiciously and
run at full speed you can get rid of most of the waits.

Original comment by gabebear@gmail.com on 15 Jul 2009 at 2:05

GoogleCodeExporter commented 9 years ago
It's actually somewhat worse than this, since the constants that the classes 
differ by are (usually?) identical.  I experimented some with a version of 
HardwareSerial.cpp that uses a single base address and a stucture for the uart:

typedef struct UART_struct
{
    volatile uint8_t UCSRA;  /* Control Register A */
    uint8_t UCSRB;  /* Control Register B */
    uint8_t UCSRC;  /* Control Register C */
    uint8_t reserved_0x02;
    uint8_t UBRRL;  /* Baud Rate Control Register A */
    uint8_t UBRRH;  /* Baud Rate Control Register B */
    volatile uint8_t UDR;  /* Data Register */
} UART_t;

with rather favorable reductions in code and ram size.  But I need to figure 
out how to make sure the device in question has registers that match the 
structure (not true for mega8, for example.)

Original comment by wes...@gmail.com on 9 Nov 2010 at 6:58