earlephilhower / newlib-xtensa

newlib-xtensa fork intended for esp8266
GNU General Public License v2.0
5 stars 7 forks source link

sys/pgmspace.h: Strengthen inline-ness of `pgm_read_byte_inlined()` and its siblings #22

Closed jjsuwa-sys3175 closed 2 years ago

jjsuwa-sys3175 commented 2 years ago

Currently, invoking pgm_read_byte() is not always inlined, eg. in libc.a, there are 5 function bodies and 89 "CALL0" (sometimes in loop) to pgm_read_byte_inlined().

Generally, function calling brings negative effects in performance, eg. call/return instruction overhead, unwanted register-register moves in order to comply with the calling convention and many callee-clobbered registers that hinder efficient register allocation.

00000468 <memchr_P>:
 468:   f0c112                  addi    a1, a1, -16
 46b:   21c9                    s32i.n  a12, a1, 8
 46d:   11d9                    s32i.n  a13, a1, 4
 46f:   01e9                    s32i.n  a14, a1, 0
 471:   3109                    s32i.n  a0, a1, 12
 473:   02cd                    mov.n   a12, a2
 475:   74e030                  extui   a14, a3, 0, 8
 478:   d24a                    add.n   a13, a2, a4
 47a:   000246                  j   487 <memchr_P+0x1f>
 47d:   0c2d                    mov.n   a2, a12
 47f:   ffb805                  call0   0 <pgm_read_byte_inlined>
 482:   0612e7                  beq a2, a14, 48c <memchr_P+0x24>
 485:   cc1b                    addi.n  a12, a12, 1
 487:   f29dc7                  bne a13, a12, 47d <memchr_P+0x15>
 48a:   0c0c                    movi.n  a12, 0
 48c:   3108                    l32i.n  a0, a1, 12
 48e:   0c2d                    mov.n   a2, a12
 490:   11d8                    l32i.n  a13, a1, 4
 492:   21c8                    l32i.n  a12, a1, 8
 494:   01e8                    l32i.n  a14, a1, 0
 496:   10c112                  addi    a1, a1, 16
 499:   f00d                    ret.n

In contrast, complete inlining brings furtuer optimization opportunites, eg. common expression elimination/sharing and loop-invariant expression hoisting.

000003c4 <memchr_P>:
 3c4:   743030                  extui   a3, a3, 0, 8
 3c7:   424a                    add.n   a4, a2, a4
 3c9:   c67c                    movi.n  a6, -4
 3cb:   000506                  j   3e3 <memchr_P+0x1f>
 3ce:   00                          .byte 00
 3cf:   00                          .byte 00
 3d0:   105260                  and a5, a2, a6
 3d3:   0558                    l32i.n  a5, a5, 0
 3d5:   402200                  ssa8l   a2
 3d8:   915050                  srl a5, a5
 3db:   745050                  extui   a5, a5, 0, 8
 3de:   061357                  beq a3, a5, 3e8 <memchr_P+0x24>
 3e1:   221b                    addi.n  a2, a2, 1
 3e3:   e99427                  bne a4, a2, 3d0 <memchr_P+0xc>
 3e6:   020c                    movi.n  a2, 0
 3e8:   f00d                    ret.n

Before & after in bytes of .text, in libc.a: