golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.98k stars 17.67k forks source link

unicode: decrease binary size #7600

Open josharian opened 10 years ago

josharian commented 10 years ago
Currently, the generated unicode tables.go sets up a separate slice for each R16/R32 in
each RangeTable, each with its own backing array.

Rearranging the code generated by maketables.go (in a way that is invisible to the
exported API) so that the RangeTable slices point into a big, shared R16/R32 array
reduces the contribution of the unicode tables to binary size by ~35k. If issue #7599
were fixed as well, the space savings would be ~45k to ~60k. Details on the savings
below.

Questions:

(1) Are these space savings significant enough to warrant possible inclusion in Go 1.3,
or should I wait to polish + mail the CL until Go 1.4?
(2) Is there a reason not to do this rearrangement?
(3) Is there a fix to the toolchain that achieves these reductions in a better / cleaner
/ deeper way? (For example, instead of creating a separate backing array symbol and
slice header symbol for staticly initialized slices, one could just create a single
slice symbol containing the slice header followed by the array. That would prove some
space savings.)

Details on the size changes:

$ cat radical.go
package main

import "unicode"

func main() {
    _ = unicode.Radical
}

Build with 6g.

Binary size before: 733664 bytes. Binary size after: 699296 bytes.

Largest symbols before:

$ go tool nm -size -sort size radical | head -n 50
   4e0c0     101365 R _esymtab
   4e0c0     101365 R _pclntab
   4e0c0     101365 R _etypelink
   4e0c0     101365 R _symtab
   87200      56984 B runtime.mheap
   3d340      49024 R _gcbss
   319f8      47372 R go.string.*
   265a0      46168 R _rodata
   265a0      46168 R type.*
   81fc0      21056 B _bufferList
   492c0      18192 R _gcdata
   492c0      18192 R _egcbss
   7e100      16064 B _semtable
   22920      15088 T unicode.init

Largest symbols after:

$ go tool nm -size -sort size radical | head -n 50
   4efa0     102141 R _pclntab
   878c0      56984 B runtime.mheap
   3e420      52360 R _gcbss
   33c18      42956 R go.string.*
   2a5c0      38488 R type.*
   2a5c0      38488 R _rodata
   22920      31504 T unicode.init
   82680      21056 B _bufferList
   6a740      20904 D unicode.allRange16
   7e7c0      16064 B _semtable
   4b0c0      14856 R _gcdata
   68020      10016 D unicode.allRange32
   7c7a0       8192 B _pdesc
   7a800       8096 B _hash

The main size savings here come from a reduction in the number of small symbols
generated to hold staticly initialized autotmp values, each with their own overhead
(name, padding, etc.).

The increase in the size of unicode.init is addressable via issue #7599.
ianlancetaylor commented 10 years ago

Comment 1:

Labels changed: added repo-main, release-go1.3maybe.

josharian commented 10 years ago

Comment 2:

Owner changed to @josharian.

Status changed to Started.

rsc commented 10 years ago

Comment 3:

If the problem is padding in the linker we should fix the linker. The rewrite forces the
linking of all unicode table data even if you import unicode and only refer to
unicode.Greek.
Right now importing unicode and not referring to anything still pulls everything in,
because the map init-time code keeps the dead symbol removal from working. But let's not
add a second reason.
Leaving this issue open to be about making unicode take less memory, but I think we'll
need a different approach.

Labels changed: added release-go1.4, removed release-go1.3maybe.

josharian commented 10 years ago

Comment 4:

Agreed that we should fix it more deeply.
It is not just padding. It's also the autotemp symbol name showing up multiple places,
the autogenerated init code, etc. Some of these will be fixable head on; reducing the
number of symbols will also help. See the discussion at the end of
https://golang.org/cl/78870047/ for related issues.

Owner changed to ---.

Status changed to Accepted.

bradfitz commented 10 years ago

Comment 5:

I never saw discussion of accepting 78870047 for Go 1.3 and reverting it in Go 1.4 when
the issue is fixed properly.
If it gets us smaller binaries (a goal for Go 1.3) and doesn't hurt our already-broken
support for dropping Glagolitic when we only want Greek, it seems worth considering?
josharian commented 10 years ago

Comment 6:

Labels changed: added release-none, removed release-go1.4.