golang / go

The Go programming language
https://go.dev
BSD 3-Clause "New" or "Revised" License
123.89k stars 17.65k forks source link

cmd/compile: add special binary export encoding for iotas? #20088

Open josharian opened 7 years ago

josharian commented 7 years ago

Poking around the export data for the ssa package, I noticed a lot of runs of constants generated by iota--each with an int value + 1 and a line +1 from its predecessory. It is a somewhat common structure that admits of a compact encoding, something like this:

// export
func (p *exporter) iotas(e []*Node) {
    p.tag(iotaTag) 
    p.int(len(e))
    n := e[0]
    p.pos(n)                            
    p.typ(unidealType(n.Type, n.Val()))
    p.value(n.Val())                   
    for _, n := range e {
        p.qualifiedName(n.Sym) 
    }
}

// import
    // ...
    case iotaTag:
        niota := p.int()
        p.pos() // TODO: hook this up and increase by one line at each new node
        typ := p.typ()
        val := p.value(typ)
        for i := 0; i < niota; i++ {
            sym := p.qualifiedName()
            importconst(p.imp, sym, idealType(typ), nodintconst(val.Interface().(int64)+int64(i)))
        }
        return niota
    // ...

A possibly even more compact encoding is to detect common prefixes in names across the entire run, which is fairly common. Even without that, this has a pretty significant impact on the ssa package:

name        old export-bytes  new export-bytes  delta
Template          19.0k ± 0%        19.0k ± 0%     ~     (all equal)
Unicode           4.45k ± 0%        4.43k ± 0%   -0.31%  (p=0.002 n=6+6)
GoTypes           29.7k ± 0%        29.6k ± 0%   -0.53%  (p=0.002 n=6+6)
Compiler          75.6k ± 0%        74.3k ± 0%   -1.62%  (p=0.002 n=6+6)
SSA               76.2k ± 0%        65.7k ± 0%  -13.79%  (p=0.002 n=6+6)
Flate             4.98k ± 0%        4.98k ± 0%   +0.02%  (p=0.002 n=6+6)
GoParser          8.81k ± 0%        8.81k ± 0%   +0.01%  (p=0.002 n=6+6)
Reflect           6.25k ± 0%        6.10k ± 0%   -2.37%  (p=0.002 n=6+6)
Tar               9.49k ± 0%        9.46k ± 0%   -0.31%  (p=0.002 n=6+6)
XML               16.0k ± 0%        16.0k ± 0%     ~     (all equal)
[Geo mean]        15.1k             14.8k        -1.98%

It's not quite obvious that it's worth doing in general--SSA might be somewhat special.

For discussion, if we make another round of export format changes.

josharian commented 7 years ago

Oops, forgot to cc @griesemer @mdempsky

griesemer commented 7 years ago

There's already a TODO in bexport.go to use a more condensed form for "long runs of constants". There are other packages that come to mind (Unicode, syscall) which perhaps don't use iota but still could benefit from more compact encoding. I think this is worthwhile doing, perhaps together with the string encoding change.

josharian commented 7 years ago

I also have a local type export change I'm playing with now that saves another percent and change. I'll see about pulling together a handful of CLs.

I don't really know all the things that need to change when the export format changes. Is it just the compiler and go/type's gcimporter?

Also, for some of these changes, it might be rather invasive to try to gracefully support reading all old versions, the way we do now. :(

griesemer commented 7 years ago

Supporting old versions: Good point. If we do change the export format version, people will find out that the need to recompile, but it's going to be a major nuisance for a lot of people (including us) working at tip if we all have to recompile existing libraries. We probably should try to be backward compatible to make this a transparent change.

When changing the export format, we need to change:

If we're backward-compatible, it's ok to generate the old format in tools we don't know about.

griesemer commented 7 years ago

PS: being backward compatible for the position encoding change may negate the effect. hmm... (maybe that one needs to handles separately).

josharian commented 7 years ago

Basically none of the changes I've been considering are really backwards compatible. They'd all involve a version bump. But I thought you put in pretty good version skew detection and error messages.

Maybe I'll put together some compiler-only CLs for you to look at, and then we can decide together whether to proceed with them.

griesemer commented 7 years ago

A compiler that can read the new format will be able to read the old format, and discern based on export format version. But yes, if the new format is present, one needs a compiler that understands the new format (but we don't need to upgrade all existing installed libraries).

josharian commented 7 years ago

Punting to 1.10 because of uncertainty about interaction with #20070.