Open mvdan opened 2 years ago
I should note that using SSA could have the advantage of supporting more languages or syntax forms, but since we only aim to obfuscate Go code, that's not really a benefit.
I vote for ast obfuscation.
Obfuscation of sources without much difficulty will allow to implement some classical methods.
if (err != nil) { }
->
var a int
if (err == nil) {
a = 0x1337
} else {
a = 0x7331;
}
if (a ^ 0x975 == 0x????) { code }
call <function>
to call ptr [rax+14
]):someFunc(1, 2, 3)
->
var G = newProxyObj() // global var
G->a(1, 2, 3)
I think methods above are already enough to hide patterns.
More complex methods (like control flow flattering) are also possible, but will require much harder manipulation of the code (i.e. data/flow analyzer is needed)
I implemented PoC of junk code injection: https://github.com/pagran/garble/tree/flow-obfuscation Output example: https://gist.github.com/pagran/bca8e94be277b90b9d78185ab64e208e
Final build size increases depending on obfuscator settings (check here), but performance drops very slightly. Does it make sense to finish it?
Can you explain how the junk injection works? I can see that the code might not be removed by the compiler as redundant, but you're still essentially ending up with dead code. I imagine one could deobfuscate the code and fairly easily trim the bits of junk that can be proven to be unreachable.
This is not to say that your idea to insert junk code isn't right, but I also think the approach of using a global array and juggling integers with switches is pretty easy to untangle.
If we were to support inserting junk code, I think ideally it should:
os
, it could randomly inject calls to its APIs like os.Getenv
or os.Open
, with similarly randomly generated parameters. One can imagine similar things for other packages like time
, net/http
, and so on.My personal and honest opinion is that we should instead look at incremental alterations to existing code, rather than injecting code. This is what I meant by "shuffle" in my original post. For example:
if
with its else
switch
statement casesif
with a for
that loops zero or one times[:]
on slice expressions, adding redundant type conversions, adding redundant parentheses or blocks, or adding labels to for and switch statements and their break/continue statementsI realise these are just draft ideas and likely pretty hard to implement. I also realise that some of them might not matter in terms of compiled binaries, but we still care about source code obfuscation, I think.
I think it's also worth investigating what other obfuscators have done before we spend tens or hundreds of hours on this feature. I personally have no experience in this kind of obfuscation, so while I can give you reasons why some approaches aren't likely to be successful, I can't also say what a good approach looks like with confidence either :)
Currently giving https://arxiv.org/abs/1809.11037 a read, which carefully examined theory and practice of Java control flow obfuscation as of 2018.
Thinking aloud: Go does have goto
, so that could be pretty useful in terms of flattening control flow.
@mvdan The definitive paper for control flow obfuscation is this one: http://ac.inf.elte.hu/Vol_030_2009/003.pdf
There are several basic techniques described in that paper that should be the initial focus for control flow obfuscation. Flattening, Bogus Instructions, and Substitution. That's a good place to start, and some of these have been mentioned already. Paper authors have a bit more info on their git here: https://github.com/obfuscator-llvm/obfuscator
There are many implementations of this paper already, including the original fork of LLVM and this one based on GCC that actually has a better explanation of some of the methods: https://github.com/meme/hellscape
Basic control flow obfuscation will require AST manipulation, or at least different code generation from the AST, as has been mentioned above. Once the basic stuff is implemented, you could start trying to invent something new and additional, but... the basic methods in the paper are a good mixture of simple to implement and hard to reverse.
Another project worth mentioning is MovFuscator, which compiles everything to MOV's: https://github.com/xoreaxeaxeax/movfuscator
Just wanted to point out that the things being discussed above are largely solved problems with multiple implementations in other compilers... INCLUDING an implementation that works for gccgo already (which means it can't target Windows tho, so not so useful).
Java obfuscation is a different art form entirely, because it leans heavily on reflection and the differences between the VM instructions and the source code. Some of the control flow stuff might translate, but you'll probably have better luck looking at the LLVM-opcode level obfuscation and GCC port of it linked above.
Main idea of junk code is simple, to "blur" the original code. This is by no means controlflow obfuscation (which requires good control flow analysis). The junk code should protect from bindiff, in the future confuse references to external functions (file handling, networking, etc) and make the pattern search and primary manual analysis in disassemblers harder.
On the source side it is not so hard to remove it, because switch construction is clearly visible, on the binary side it is not so easy. Because the junk code actively refers to variables on which the execution flow depends and a simple "reference search" will not do anything.
About the code plausibility I agree, but current version is nothing more than a prototype in which only the minimum for demonstration is implemented.
Thanks, @awgh, those links are a good starting point.
There are several basic techniques described in that paper that should be the initial focus for control flow obfuscation. Flattening, Bogus Instructions, and Substitution.
I see that the paper does talk about flattening with some detail, but it only mentions bogus instructions (aka @pagran's "junk code injection) and substitutions, without really giving much detail. Do you have more links for those? I'm particularly wondering how one would go about inserting bogus instructions without having them be rather easy to recognize.
@pagran I understand yours is just a prototype for now, and I really appreciate the help - my opinion is just that we should carefully design our first steps before we start writing code. For that reason, I really want to understand what other good control flow obfuscators do, and then find a relatively straightforward way to implement one of their basic techniques to begin with.
For example, when it comes to inserting bogus or junk code, I'm really not sure that teaching garble how to insert global arrays and switch statements will be the best long-term strategy. We might well end up at a point where we have to maintain thousands of lines of code for the sake of teaching garble specific kinds of junk code it can inject. It would be much better if, say, we could procedurally generate valid Go code that never executes.
I also have the feeling that, when it comes to Go, AST substitutions will be the easier first step when compared to flattening and bogus instructions, because they could be written as small "rewrite rules" that would get applied to each existing AST node.
Okay, then I propose to create a separate issue with a list of possible "transformations", for example if -> state machine
After approval, I will already start writing code.
@mvdan OK, for substitutions, it can be a bit more complicated because you need to be somewhat aware of generated assembly and how that assembly will be disassembled. That said, there are some old classic tricks to check out.
Here's a recent summary of an old trick: https://tmpout.sh/1/6.html And the original paper was Silvio's obviously: http://www.ouah.org/linux-anti-debugging.txt
Rather than inserting garbage go-level source, you should be thinking about making transforms to the generated output that will not change the actual behavior of the running program at all, like the above examples.
"Bogus Instructions" literally means just inserting a bunch of dead code that will never get hit. This is easy to insert for PIC, because you can just insert blobs of garbage and JMP over them. @pagran 's random switch statement idea is pretty good, just make a rat's nest of JMPs that never get hit... and you don't really have to track whether or not bogus code really works, because it shouldn't actually be getting hit.
I don't think there's any point in inserting data like arrays, because that won't mess with someone trying to disassemble it at all.
Substitutions == This code will run, but you can make it hard to disassemble. Bogus Instructions == This code will NOT run, it can be utter trash.
Flattening is the only one of these things that requires AST manipulation, the other two are maybe better performed in the code generation phase.
If you have to manipulate the source code to manipulate the generated code... you're going to have a hard time. This is more or less the wrong turn that your predecessors took with gobfuscate.
edit: Although you could make a truly funny bogus code generator with direct AST manipulations, you don't really need to do it this way... could also do it only in code generation.
I think we've touched on "code generation" before; because we're not forking the Go toolchain itself, we don't have access to its IR, which is a form of SSA. We similarly don't have access to the "lowered" SSA, which uses target-specific instructions; that's the closest stage to generated code that the compiler gets to before it actually produces object files.
That said, you've given me an idea: turn the AST to SSA via https://pkg.go.dev/golang.org/x/tools/go/ssa, and then spit out the simplest form of Go code that implements the SSA. That may already do a significant chunk of the flattening for us, and it should be easier to obfuscate the function body in SSA form. Generating Go code equivalent to some SSA is probably relatively complex, but it might be our best long-term bet; the Go compiler only accepts Go code as input.
^ I intend to do a bit of research into that SSA idea, and will likely coordinate with @awgh, @pagran and others as I have any updates. @pagran are you OK with giving me a couple of weeks to look into this? Assuming that the experiment will succeed, it will be a promising approach long-term, but also radically different to altering the go/ast directly :)
Sure :)
p.s. Why didn't github notify me about this message? -_-
You've probably already seem this since it's the first google result for "golang ast obfuscation" but there's some previous work in here: https://github.com/q6r/gomambojambo
Not everything is useful since garble already has string/function/package name obfuscation but it also has examples for dead code insertion and a simple kind of control flow obfuscation (for loops converted to gotos).
There's also this https://github.com/meme/hellscape which I've tested and works but it hooks into gccgo so I guess it's useless for us. A mix of garble and hellscape would be insane tho.
I intend to do a bit of research into that SSA idea
Blocked by https://github.com/golang/go/issues/48525 at the moment; starting to use the SSA package today would mean breaking the obfuscation of some generic programs.
It's a bit sudden, but it' possible to make a fully "reflection-frendly" mode obfuscation.
If disable all names obfuscation and move method bodies to separate functions and hide the calls, it will be an acceptable level of obfuscation because the original names remain in the binary, but it becomes very hard to associate them with functions.
Example:
// main.go
type X struct {
secret string
}
func (f X) test() string {
return f.secret + "xxxx"
}
func main() {
x := X{}
println(x.test())
}
to:
// main.go
package main
type X struct {
secret string
}
func (f X) test() string {
return C._obfName(f)
}
func main() {
x := X{secret: "hi"}
println(x.test())
}
// main_body.go
package main
var C = _controller{}
func init() {
// some obf code with unsafe.Pointer
C._obfName = _someObfName
}
type _controller struct {
_obfName func(f X) string
}
func _someObfName(f X) string {
return f.secret + "xxxx"
}
@pagran is working on https://github.com/burrowers/garble/pull/752
That PR is now merged, but the feature is not enabled for all functions by default - it is experimental and opt-in for now. I'd keep this issue open until it's on by default.
This landed in experimental form in master via https://github.com/burrowers/garble/blob/master/docs/CONTROLFLOW.md. I think we should keep this issue open until the feature is mature enough, and either promoted to a flag like -literals
or enabled by default in some form.
Right now we strip some information away from the compiled code in function bodies, such as position information, variable names, and the names of funcs and types being used. However, the compiled code looks otherwise extremely similar to its non-obfuscated counterpart, especially in its structure.
For example, if I perform two obfuscated builds of the same program with different seeds, all the func/type/var names will be different, but one could deobfuscate a function body and quickly spot the pairs of corresponding obfuscated names in the two builds, as the structure of the function will be very similar. Meaning that if I manage to figure out what an obfuscated name in one build stands for, I can reuse that knowledge rather easily in the other build.
One can also imagine deobfuscating the Go code and trying to spot common patterns in "idiomatic" Go code, such as
if err != nil { handle(err) }
. Being able to quickly spot these patterns, even if the names are obfuscated, could lead to an easier understanding of what the code is doing.We should investigate ways to improve this situation. In general terms, what we want is to deterministically "shuffle" the code around using the seed, akin to what we already do with literal obfuscation or when reordering declarations.
Doing this at the machine code level definitely seems like a bad idea; we'd need to explicitly support each GOARCH target. It would also require being able to modify object files in-place, further increasing the required complexity.
Doing it at the Go syntax level via
go/ast
is probably the most obvious option we have. We already do something like it when obfuscating literals, and it seems to work well. I think it could become more feasible if we implemented a "reduction" of the AST first, as per https://github.com/burrowers/garble/issues/459.Doing this at the compiler's SSA IR level could also be interesting. Advantages:
go/ast
is significantly more complex thango/ssa
, as there are multiple ways to write the same piece of logic.Disadvantages:
-toolexec
, and is only kept in memory. This would likely mean having to build and use a modified version ofcmd/compile
.go/ast
and the Go syntax, Go's SSA representation is internal and may change in backwards-incompatible ways over the course of Go releases.Thus, my initial thoughts are that we should aim for obfuscating func bodies via go/ast rather than the compiler's internal SSA. Happy to hear opinions, counter-points, or other potential ways to solve this.