Open mdempsky opened 8 years ago
This could help with shrinking init-time map literal construction in e.g. the unicode
package.
If the unicode.init
funcs generating the unicode
's various maps were flagged as side-effect-free, then the whole init funcs can be deadcode eliminated if nobody referred to those maps.
/cc @crawshaw @josharian
init methods are a problem because they may reference (& initialize) otherwise dead globals. Anything we can do to initialize globals without inits will help (see https://go-review.googlesource.com/c/17398/ for an example). It doesn't look like this is an easy transformation here, because newDead() could do anything. If instead it was:
var x = new(dead)
maybe we could rewrite that in the compiler to
var _x dead
var x = &_x
the latter wouldn't need any init code, just relocations, so if x was otherwise unused both x and _x would be stripped out at link time.
On Wed, Mar 16, 2016 at 1:49 PM, Brad Fitzpatrick notifications@github.com wrote:
This could help with shrinking init-time map literal construction in e.g. the unicode package.
If the unicode.init funcs generating the unicode's various maps were flagged as side-effect-free, then the whole init funcs can be deadcode eliminated if nobody referred to those maps.
/cc @crawshaw https://github.com/crawshaw @josharian https://github.com/josharian
— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/14840#issuecomment-197542441
@randall77 I think during escape analysis we could conservatively detect whether a function has side effects?
I don't think the current escape analysis can help, as it runs after the init functions are built. You'd want to decide the side-effect-freeness of the RHS of global initializers before the init function is built. Probably a very simple analysis would capture most of the cases we care about.
And then you'd have to have an init function per global, have it associated with the global, and only link it in if the global was otherwise reachable.
The compiler already has samesafeexpr. We could remove the "same" part and steal that.
I wonder whether have an init function per global (with all the associated overhead) would end up outweighing the other good done.
Overhead from too many init functions is a fair concern. I'd imagine a lot of the functions could be compiled as nosplit, which might reduce the overhead.
Also, traditionally ELF compilers generate .init
sections with straight line instructions that are just pasted together, skipping function call overhead for simple functions. (The .init
code still ends up forming a proper function because the linker will also include C runtime files like crti.o
and crtn.o
, which supply the appropriate function prologue/epilogue in their .init
sections.)
Anyone writing code for this? If not, I'll spend a couple of days on it.
Assuming each global with side-effect free initialization is split off, I suspect the linker will need to merge the safe init functions together after reachability analysis. But that should be safe to do as long as we distinguish the safe generated init code from general init functions.
All yours.
@crawshaw Go for it.
I made a bit of progress on this, but after looking closer I don't know how it will help binary size.
The obvious target is the range tables in the unicode package. It would be nice to make it possible for the linker's deadcode pass to catch exported symbols like Nl, Nd, etc. But: all of these tables are included in the unicode.Categories map, which is used by the regexp package.
All the real world programs I looked at use package regexp. Even tiny ones like objdump. So while this will make the "hello world" program from #6853 look good, I don't see how it will help in practice.
If anyone can convince me otherwise I'll dust this off, but for now this looks like an academic exercise to me.
I'm going to look into extending escape analysis to keep track of side-effect free functions for Go 1.17.
Not sure if this is happening, but moving milestone to backlog as it is a long-standing issue.
As a datapoint, I just tried changing the representation of unicode.RangeTable
from a struct to a string
containing the binary packed contents of the RangeTables.
Then instead of vars, I made them all consts. And instead of the five map[string]*RangeTable
global variables, I made them funcs that returned a sync.Once
-lazily-init map. (which the linker knows how to eliminate if unused)
I of course also updated all the funcs taking a *RangeTable
pointer to instead just take a RangeTable
string.
After minor corresponding updates to the regexp
and encoding/xml
packages (which both used *RangeTable
), the binary size of a fmt
-using Hello World program dropped 100 KB on Linux.
bradfitz@tsdev:~$ ls -l hello.before hello.after
-rwxr-xr-x 1 bradfitz bradfitz 1843533 Aug 4 22:20 hello.after
-rwxr-xr-x 1 bradfitz bradfitz 1942437 Aug 4 22:20 hello.before
bradfitz@tsdev:~$ size hello.before hello.after
text data bss dec hex filename
1261840 88756 207456 1558052 17c624 hello.before
1234832 25752 207424 1468008 166668 hello.after
bradfitz@tsdev:~$ go tool nm hello.before | grep -F unicode. | wc -l
242
bradfitz@tsdev:~$ go tool nm hello.before | grep -F unicode. | head -20
536f40 D unicode..inittask
544318 D unicode.ASCII_Hex_Digit
544320 D unicode.Adlam
544328 D unicode.Ahom
544330 D unicode.Anatolian_Hieroglyphs
544338 D unicode.Arabic
544340 D unicode.Armenian
544348 D unicode.Avestan
544350 D unicode.Balinese
544358 D unicode.Bamum
544360 D unicode.Bassa_Vah
544368 D unicode.Batak
544370 D unicode.Bengali
544378 D unicode.Bhaiksuki
544380 D unicode.Bidi_Control
544388 D unicode.Bopomofo
544390 D unicode.Brahmi
544398 D unicode.Braille
5443a0 D unicode.Buginese
5443a8 D unicode.Buhid
bradfitz@tsdev:~$ go tool nm hello.after | grep -F unicode. | wc -l
1
bradfitz@tsdev:~$ go tool nm hello.after | grep -F unicode.
530280 D unicode..inittask
Which is all to say: there are wins to be had here if anybody is curious.
Change https://go.dev/cl/461315 mentions this issue: cmd/compile,cmd/link: enable deadcode of unreferenced large global maps
Change https://go.dev/cl/463395 mentions this issue: cmd/link: linker portion of dead map removal (patch 2 of 2)
Change https://go.dev/cl/463855 mentions this issue: cmd/internal/obj: flag init functions in object file
For posterity, linker dead code elimination of unused maps is now implemented in this stack.
Getting back to the original example:
package main
type dead int
func newDead() *dead { return new(dead) }
var x = newDead()
func main() {}
In principle the same strategy used for maps could be used here for variables like "x", e.g. outline the code that performs the init (e.g. call to newDead), then have a weak relocation from main.init to the outlined init func, and a strong relocation from "x" to the outlined init func.
The existing code that checks for side effects is here; in theory if were were to extend this code to work interprocedurally we might be able to catch this case.
Currently if we compile a Go program like:
the linker can't dead code eliminate
x
ordead
. This is because the "x = newDead()" initialization is compiled to an implicit init function, which causes the linker to pull inx
andnewDead
.(See https://golang.org/cl/20765 for a real world example.)
In general, just because
x
is otherwise unused, the linker can't get rid of thenewDead()
call because it might have side-effects. However, it should be possible for the compiler to help identify functions that are side-effect free, which could in turn let the linker be more aggressive about dead code elimination.We would probably also need to tweak how cmd/compile generates package initializer functions for the linker to be able to eliminate individual initializers.