burrowers / garble

Obfuscate Go builds
BSD 3-Clause "New" or "Revised" License
3.93k stars 248 forks source link

Reflection issue with embedded struct #765

Closed xuannv112 closed 1 year ago

xuannv112 commented 1 year ago

What version of Garble and Go are you using?

$ garble version
mvdan.cc/garble v0.10.0

$ go version
go version go1.20.5

What did you do?

Here is the sample code for obfuscation

func main() {
        type MyInt int
    var t struct {
        Field1 MyInt
        MyInt
    }
    _ = reflect.ValueOf(t)
    _ = reflect.ValueOf(t).FieldByName("Field1")
    _ = reflect.ValueOf(t).FieldByName("MyInt")
}

What did you expect to see?

MyInt type shouldn't be obfuscated.

What did you see instead?

func main() {
    type CZKZyYFezzoE int
    var doEHHO5 struct {
        Field1  CZKZyYFezzoE
        CZKZyYFezzoE
    }
    _ =  /*line b_0JQxZezU.go:1*/ reflect.CRS4V1ASWQ(doEHHO5)
    _ =  /*line XksHcrjPjnW.go:1*/ reflect.CRS4V1ASWQ(doEHHO5).FieldByName("Field1")
    _ =  /*line UDwbo_E.go:1*/ reflect.CRS4V1ASWQ(doEHHO5).FieldByName("MyInt")
}

Notes:

1/ It is the same issue when using garble with go spew package go-spew

We obfuscate t0 field so vt0 returns 0 and panic later in the init function.

panic("reflect.Value read-only flag has changed semantics")

2/ Solution I think when using reflection, we can record non top-level objects as well. So is it ok to remove this logic? No need to compare pkg.Scope() vs obj.Parent() Check top level objects

The tradeoff here: There is no obfuscation even if there is no embedded struct when using reflection

theaog commented 1 year ago

what go/garble version could we use to avoid this issue?

theaog commented 1 year ago
> $ garble run *.go                                                                                                                           [±master ●]
# command-line-arguments
/home/m/.cache/garble/tool/link: running gcc failed: exit status 1
/usr/bin/ld: /tmp/go-link-1189358754/go.o: in function `oUEfqo.xdOu7h.swNWWA':
go.go:(.text+0x125347): undefined reference to `oUEfqo.iNRMZd'
/usr/bin/ld: /tmp/go-link-1189358754/go.o: in function `oUEfqo.iU5VGNz74.aQw4UsJmR7':
go.go:(.text+0x1256eb): undefined reference to `oUEfqo.iNRMZd'
/usr/bin/ld: /tmp/go-link-1189358754/go.o: in function `oUEfqo.xGBWdJXQaj6v':
go.go:(.text+0x1257f2): undefined reference to `oUEfqo.iNRMZd'
/usr/bin/ld: /tmp/go-link-1189358754/go.o: in function `oUEfqo.(*pPSrIql).nOIY6ItdS':
go.go:(.text+0x12f9c1): undefined reference to `oUEfqo.vXtJ6QlKb'
collect2: error: ld returned 1 exit status

exit status 2
exit status 1
xuannv112 commented 1 year ago

There are some temporary solutions 1/ Use GOGARBLE to specify which package you want to obfuscate -> I prefer this way 2/ Downgrade to go1.20.x (x<5), garblev0.9.3 (actually you only need to use the master branch but before this commit) 3/ For go-spew package, you can build with the safe tag -tags safe

lu4p commented 1 year ago

@xuannv112 sounds like you have identified an issue and have a rough idea how to fix it, happy to review your PR. See CONTRIBUTING.md to get started, if you get stuck somewhere feel free to ask for help in our slack channel.

mvdan commented 1 year ago

Note that we can't simply remove the check at https://github.com/burrowers/garble/blob/v0.10.0/reflect.go#L432, because the following line records the type as pkg.Path() + "." + obj.Name(), which assumes top-level declarations.

mvdan commented 1 year ago

My change in https://github.com/burrowers/garble/commit/0f2b59d79404f5be6725c24170fa2fad94e2b7fb was slightly flawed, as can be seen by this regression. When obfuscating a package, we still need to temporarily record what local types aren't obfuscated, even if those don't matter when obfuscating importers.

We could consider reverting, but the old code was also flawed and it would lead to bad obfuscation depending on the state of the cache.

I think it's probably time to rewrite recordedObjectString; it has a couple of hacks and TODOs, so we already knew that it wasn't great :) I'll take a crack at it tonight, since I think that's probably important enough for v0.10.1.

xuannv112 commented 1 year ago

Agree that we should check if the local types should be obfuscated or not. (using reflection and being used in an embedded struct). For a simpler solution, I think we can return this (use file-level instead of package level (if remove))

        if pkg.Scope() != obj.Parent() {
        pos := fset.Position(obj.Pos())
        return fmt.Sprintf("%s.%s - %s", pkg.Path(), obj.Name(),
            filepath.Base(pos.Filename))
    }

It means in the same file, all local types with the same obj name will not be obfuscated. I think it is acceptable.

xuannv112 commented 1 year ago

Or we can check in the object's scope if there are any structs that used the object as an embedded struct.

          getObjId := func(pkg *types.Package, obj types.Object) string {
                  pos := fset.Position(obj.Pos())
                  return fmt.Sprintf("%s.%s - %s:%d", pkg.Path(), obj.Name(),
                      filepath.Base(pos.Filename), pos.Line)
     }
    if pkg.Scope() != obj.Parent() {
        // check if obj is a field of an embedded struct by lookup it's scope
        if _, ok := obj.(*types.TypeName); ok {
            scope := obj.Parent()
            for _, objName := range scope.Names() {
                otherObj := scope.Lookup(objName)
                if structType, ok := otherObj.Type().(*types.Struct); ok {
                    for i := 0; i < structType.NumFields(); i++ {
                        field := structType.Field(i)
                        if field.Name() == obj.Name() && field.Embedded() {
                            return getObjId(pkg, obj)
                        }
                    }
                }
            }
        }

        return ""
    }

So we will ignore only local types used in embedded field (without impact to other obj)

mvdan commented 1 year ago

Definitely don't do that - that sort of double loop linear search will be very expensive in this part of the codebase.

I think it's okay, for now, to change recordedObjectString to do the same for local types that it already does for struct fields. It's a hack, but it should work.

xuannv112 commented 1 year ago

I have moved the embedded field check logic to only ri.recordUsedForReflect function to save cost. https://github.com/burrowers/garble/pull/767

Here is the benchmark result:

go test -run=- -bench=. -count=6 -benchtime=1x

goos: darwin
goarch: amd64
pkg: mvdan.cc/garble
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
         │  old.txt   │           new.txt           │
         │   sec/op   │   sec/op    vs base         │
Build-12   24.84 ± 9%   25.20 ± 6%  ~ (p=1.000 n=6)

         │   old.txt    │              new.txt               │
         │    bin-B     │    bin-B      vs base              │
Build-12   5.642Mi ± 0%   5.643Mi ± 0%  +0.02% (p=0.002 n=6)

         │    old.txt    │               new.txt                │
         │ cached-sec/op │ cached-sec/op  vs base               │
Build-12    611.3m ± 36%    723.0m ± 41%  +18.28% (p=0.041 n=6)

         │   old.txt   │              new.txt              │
         │ mallocs/op  │ mallocs/op   vs base              │
Build-12   34.54M ± 0%   36.59M ± 0%  +5.93% (p=0.002 n=6)

         │   old.txt   │           new.txt           │
         │ sys-sec/op  │ sys-sec/op  vs base         │
Build-12   17.25 ± 10%   17.13 ± 5%  ~ (p=0.818 n=6)
xuannv112 commented 1 year ago

I think i can optimize a litte bit by scanning only when there are 2 conditions: reflection + embedded field. It is better when trigger by only 1 condition: reflection

xuannv112 commented 1 year ago

I have updated the logic: checking the embedded fields. No need for linear search at all. Here is the benchmark result

goos: darwin
goarch: amd64
pkg: mvdan.cc/garble
cpu: Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz
         │  old.txt   │           new.txt           │
         │   sec/op   │   sec/op    vs base         │
Build-12   24.84 ± 9%   25.26 ± 8%  ~ (p=0.937 n=6)

         │   old.txt    │              new.txt               │
         │    bin-B     │    bin-B      vs base              │
Build-12   5.642Mi ± 0%   5.648Mi ± 0%  +0.11% (p=0.002 n=6)

         │    old.txt    │            new.txt             │
         │ cached-sec/op │ cached-sec/op  vs base         │
Build-12    611.3m ± 36%    592.4m ± 16%  ~ (p=0.937 n=6)

         │   old.txt   │              new.txt              │
         │ mallocs/op  │ mallocs/op   vs base              │
Build-12   34.54M ± 0%   36.60M ± 0%  +5.94% (p=0.002 n=6)

         │   old.txt   │           new.txt           │
         │ sys-sec/op  │ sys-sec/op  vs base         │
Build-12   17.25 ± 10%   17.52 ± 8%  ~ (p=0.485 n=6)
mvdan commented 1 year ago

Hang on, but why do we need the extra logic? As far as I know, the %s.%s - %s:%d hack will work just fine for local types just like it already does for fields. We don't need the extra code, I'm fairly sure.

xuannv112 commented 1 year ago

For this use case

    type Tqw int
    var T11 struct {
        A Tqw
        y Tqw
        a Tqw
    }
    _ = reflect.ValueOf(T11).FieldByName("A")

Tqw is still obfuscated as the old logic.

1/ My PR will not obfuscate Tqw only when Tqw is used as an embedded field 2/ If there is no extra logic and simply return %s.%s - %s:%d, for normal type (not used as an embedded field), it will not be obfuscated as the old logic.

xuannv112 commented 1 year ago

It is a minor case, but I want to keep the old logic unchanged if the cost is acceptable.