golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.21k stars 17.57k forks source link

cmd/link: dead code elimination for side-effect free functions #14840

Open mdempsky opened 8 years ago

mdempsky commented 8 years ago

Currently if we compile a Go program like:

package main
type dead int
func newDead() *dead { return new(dead) }
var x = newDead()
func main() {}

the linker can't dead code eliminate x or dead. This is because the "x = newDead()" initialization is compiled to an implicit init function, which causes the linker to pull in x and newDead.

(See https://golang.org/cl/20765 for a real world example.)

In general, just because x is otherwise unused, the linker can't get rid of the newDead() call because it might have side-effects. However, it should be possible for the compiler to help identify functions that are side-effect free, which could in turn let the linker be more aggressive about dead code elimination.

We would probably also need to tweak how cmd/compile generates package initializer functions for the linker to be able to eliminate individual initializers.

bradfitz commented 8 years ago

This could help with shrinking init-time map literal construction in e.g. the unicode package.

If the unicode.init funcs generating the unicode's various maps were flagged as side-effect-free, then the whole init funcs can be deadcode eliminated if nobody referred to those maps.

/cc @crawshaw @josharian

randall77 commented 8 years ago

init methods are a problem because they may reference (& initialize) otherwise dead globals. Anything we can do to initialize globals without inits will help (see https://go-review.googlesource.com/c/17398/ for an example). It doesn't look like this is an easy transformation here, because newDead() could do anything. If instead it was:

var x = new(dead)

maybe we could rewrite that in the compiler to

var _x dead
var x = &_x

the latter wouldn't need any init code, just relocations, so if x was otherwise unused both x and _x would be stripped out at link time.

On Wed, Mar 16, 2016 at 1:49 PM, Brad Fitzpatrick notifications@github.com wrote:

This could help with shrinking init-time map literal construction in e.g. the unicode package.

If the unicode.init funcs generating the unicode's various maps were flagged as side-effect-free, then the whole init funcs can be deadcode eliminated if nobody referred to those maps.

/cc @crawshaw https://github.com/crawshaw @josharian https://github.com/josharian

— You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub https://github.com/golang/go/issues/14840#issuecomment-197542441

mdempsky commented 8 years ago

@randall77 I think during escape analysis we could conservatively detect whether a function has side effects?

randall77 commented 8 years ago

I don't think the current escape analysis can help, as it runs after the init functions are built. You'd want to decide the side-effect-freeness of the RHS of global initializers before the init function is built. Probably a very simple analysis would capture most of the cases we care about.

And then you'd have to have an init function per global, have it associated with the global, and only link it in if the global was otherwise reachable.

josharian commented 8 years ago

The compiler already has samesafeexpr. We could remove the "same" part and steal that.

I wonder whether have an init function per global (with all the associated overhead) would end up outweighing the other good done.

mdempsky commented 8 years ago

Overhead from too many init functions is a fair concern. I'd imagine a lot of the functions could be compiled as nosplit, which might reduce the overhead.

Also, traditionally ELF compilers generate .init sections with straight line instructions that are just pasted together, skipping function call overhead for simple functions. (The .init code still ends up forming a proper function because the linker will also include C runtime files like crti.o and crtn.o, which supply the appropriate function prologue/epilogue in their .init sections.)

crawshaw commented 8 years ago

Anyone writing code for this? If not, I'll spend a couple of days on it.

Assuming each global with side-effect free initialization is split off, I suspect the linker will need to merge the safe init functions together after reachability analysis. But that should be safe to do as long as we distinguish the safe generated init code from general init functions.

bradfitz commented 8 years ago

All yours.

mdempsky commented 8 years ago

@crawshaw Go for it.

crawshaw commented 8 years ago

I made a bit of progress on this, but after looking closer I don't know how it will help binary size.

The obvious target is the range tables in the unicode package. It would be nice to make it possible for the linker's deadcode pass to catch exported symbols like Nl, Nd, etc. But: all of these tables are included in the unicode.Categories map, which is used by the regexp package.

All the real world programs I looked at use package regexp. Even tiny ones like objdump. So while this will make the "hello world" program from #6853 look good, I don't see how it will help in practice.

If anyone can convince me otherwise I'll dust this off, but for now this looks like an academic exercise to me.

mdempsky commented 3 years ago

I'm going to look into extending escape analysis to keep track of side-effect free functions for Go 1.17.

ianlancetaylor commented 3 years ago

Not sure if this is happening, but moving milestone to backlog as it is a long-standing issue.

bradfitz commented 3 years ago

As a datapoint, I just tried changing the representation of unicode.RangeTable from a struct to a string containing the binary packed contents of the RangeTables.

Then instead of vars, I made them all consts. And instead of the five map[string]*RangeTable global variables, I made them funcs that returned a sync.Once-lazily-init map. (which the linker knows how to eliminate if unused)

I of course also updated all the funcs taking a *RangeTable pointer to instead just take a RangeTable string.

After minor corresponding updates to the regexp and encoding/xml packages (which both used *RangeTable), the binary size of a fmt-using Hello World program dropped 100 KB on Linux.

bradfitz@tsdev:~$ ls -l hello.before hello.after
-rwxr-xr-x 1 bradfitz bradfitz 1843533 Aug  4 22:20 hello.after
-rwxr-xr-x 1 bradfitz bradfitz 1942437 Aug  4 22:20 hello.before

bradfitz@tsdev:~$ size hello.before hello.after
   text    data     bss     dec     hex filename
1261840   88756  207456 1558052  17c624 hello.before
1234832   25752  207424 1468008  166668 hello.after

bradfitz@tsdev:~$ go tool nm hello.before | grep -F unicode. | wc -l
242
bradfitz@tsdev:~$ go tool nm hello.before | grep -F unicode. | head -20
  536f40 D unicode..inittask
  544318 D unicode.ASCII_Hex_Digit
  544320 D unicode.Adlam
  544328 D unicode.Ahom
  544330 D unicode.Anatolian_Hieroglyphs
  544338 D unicode.Arabic
  544340 D unicode.Armenian
  544348 D unicode.Avestan
  544350 D unicode.Balinese
  544358 D unicode.Bamum
  544360 D unicode.Bassa_Vah
  544368 D unicode.Batak
  544370 D unicode.Bengali
  544378 D unicode.Bhaiksuki
  544380 D unicode.Bidi_Control
  544388 D unicode.Bopomofo
  544390 D unicode.Brahmi
  544398 D unicode.Braille
  5443a0 D unicode.Buginese
  5443a8 D unicode.Buhid

bradfitz@tsdev:~$ go tool nm hello.after | grep -F unicode. | wc -l
1
bradfitz@tsdev:~$ go tool nm hello.after | grep -F unicode. 
  530280 D unicode..inittask

Which is all to say: there are wins to be had here if anybody is curious.

gopherbot commented 1 year ago

Change https://go.dev/cl/461315 mentions this issue: cmd/compile,cmd/link: enable deadcode of unreferenced large global maps

gopherbot commented 1 year ago

Change https://go.dev/cl/463395 mentions this issue: cmd/link: linker portion of dead map removal (patch 2 of 2)

gopherbot commented 1 year ago

Change https://go.dev/cl/463855 mentions this issue: cmd/internal/obj: flag init functions in object file

thanm commented 1 year ago

For posterity, linker dead code elimination of unused maps is now implemented in this stack.

Getting back to the original example:

package main
type dead int
func newDead() *dead { return new(dead) }
var x = newDead()
func main() {}

In principle the same strategy used for maps could be used here for variables like "x", e.g. outline the code that performs the init (e.g. call to newDead), then have a weak relocation from main.init to the outlined init func, and a strong relocation from "x" to the outlined init func.

The existing code that checks for side effects is here; in theory if were were to extend this code to work interprocedurally we might be able to catch this case.