burrowers / garble

Obfuscate Go builds
BSD 3-Clause "New" or "Revised" License
3.81k stars 244 forks source link

Strengthen literal obfuscation #57

Closed lu4p closed 4 years ago

lu4p commented 4 years ago

Some ideas:

Many of these will add more complexity (more to test, we maybe need additional flags to test specific combinations), some of it might not be worth the extra burden to maintain.

Note: "random" is always pseudo-random (deterministic), expect when -random is passed then it can be true random (crypto/rand).

pagran commented 4 years ago

Maybe it makes sense to generate a completely “random” decryption function? Cryptography for obfuscation is not required.

For example, we describe the base unit and generate it, we need it a number of times (pseudo go):

func garbleDecrypt(key{1-10}... uint32, data []byte, key{10-20}... uint32) string {
    data[{randInt}%len(data)] {^ + - * /}= byte(key{1-200} >> {randInt(0, 32)});         
    data[{randInt}%len(data)] {^ + - * /}= byte(key{1-200} >> {randInt(0, 32)});         
    ...
    data[{randInt}%len(data)] {^ + - * /}= byte(key{1-200} >> {randInt(0, 32)}); 
}
lu4p commented 4 years ago

@pagran Our reasoning for choosing aes, was that aes encrypted data is irreversible without the key, so we can concentrate on making the key hard to find with different placements encodings etc.

pagran commented 4 years ago

I’m not sure that it makes sense to hide the key if. If it is passed in plain text to the standard aes.NewCipher method. It’s enough to find aes.NewCipher and peek the key there.

We can't randomize aes.NewCipher pattern

lu4p commented 4 years ago

In priciple you are right however once #13 lands the package name and the function name could get obfuscated.

pagran commented 4 years ago

To search for aes, the name is not necessary, the algorithm itself has specific constants that you can’t hide. For example, the findcrypt2/3 plugin for ida pro or manualy: https://reverseengineering.stackexchange.com/a/17136

Specifically for go, you can find the same specific panic (strings): https://golang.org/src/crypto/aes/aes_gcm.go (line 125/145/178)

lu4p commented 4 years ago

We could add some xor steps additonally so that finding the aes key alone isn't enough for decryption.

mvdan commented 4 years ago

I tend to agree with @pagran; adding encryption into the mix doesn't really help us that much. The deobfuscator simply has to find the one key instead of finding the many original strings, so if anything, it's a bit easier as you can figure out all strings at once.

Really, the current mechanism is pretty weak against a deobfuscator program, because the pattern is pretty clear and easy to undo. It's only really good against dumb string scans and humans not experienced in deobfuscation.

If we really want to make this harder even for a machine, we need to increase the amount of input which can't be figured out by the deobfuscator. That is, more "randomness", keeping in mind that the tool is deterministic. The main problem with the current code is that the pattern is too simple and consistent.

I think the two main courses of action in this direction have already been brought up in this thread. On one hand, @lu4p's idea is to have multiple decrypt/deobfuscate functions per package, and choose one "at random". This would certainly help with reducing the simplicity, but we would still have significant consistency. When one encounters a garbled string, it would just be a matter of running it against all deobfuscate functions, and see which returns something correct.

The other course of action is what @pagran suggested; every literal gets its own unique and "randomly" generated deobfuscation function (or chunk of inline code). Then, the only consistent way to obtain the original value is to run the code; there isn't a way to deobfuscate all strings at once. The key here, though, will be having a good piece of software that generates different-looking "deobfuscate" code; enough that, looking at the resulting assembly or disassembled code, one can't easily tell what literals are obfuscated or not.

lu4p commented 4 years ago

Ok we should probably explore an approach other than aes, we could also use it to obfuscate every stdlib package.

mvdan commented 4 years ago

Ah, yes, that's also a big plus - not having to rely on crypto packages, which does mess with our package dependencies :)

mvdan commented 4 years ago

To clarify my thoughts - I do think that @pagran's idea to generate "random" unique code is probably the best long-term option, but it's also probably quite complex and tricky to do right. I think we should discuss its high-level design before anyone starts writing code.

pagran commented 4 years ago

We can implement a lot of (15+) base blocks:

type EncryptBlock interface {
    Encrypt(data []byte) *ast.BlockStmt
}

type XorBlock struct{}

func (*XorBlock) Encrypt(data []byte) *ast.BlockStmt
// random xor
// return decryptor as ast
}

then generate a random sequence of such blocks and based on them create a decryption function

lu4p commented 4 years ago

I think we should only have a single "decrypt" function per package but the specific implementation can be "random"

pagran commented 4 years ago

I think we should only have a single "decrypt" function per package but the specific implementation can be "random"

What for? This does not affect the performance, only the size of the output binary. Alternatively, you can use the pool from a random number of decryption functions

P.s. Maybe we can use ConfuserEx idea to simplify code writing? Instead of manually working with ast, load go code templates and manipulate them, and then simply parse into ast

lu4p commented 4 years ago

@pagran Nice idea

lu4p commented 4 years ago

@pagran fine having a lambda function for each literal would also be ok

lu4p commented 4 years ago

@pagran I wrote astextract to help with the code generation it even has a webinterface here https://lu4p.github.io/astextract/

capnspacehook commented 4 years ago

I have a few thoughts:

  1. If we choose to generate a unique deobfuscation function for each literal, that should be the hardest to statically deal with. But, the downside is that the final binary would most likely be larger than if we were to create 1 or a few deobfuscation functions and choose between them at random. This may be something that we leave up to the user, wether they need a smaller binary, or one with more obfuscated literals.
  2. We should try to make our generated literal obfuscation code resiliant to FireEye's floss project: https://github.com/fireeye/flare-floss. floss is a tool that does advanced static analysis to rip deobfuscated strings from obfuscated binaries, and it does it quite well. Obviously we are obfuscating literals, so a person determined/skilled enough will always be able to deobfuscate the literals we obfuscate, but if we can design such an algorithm that floss can't handle, that would be a great start I think.
mvdan commented 4 years ago

This may be something that we leave up to the user, wether they need a smaller binary, or one with more obfuscated literals.

I think the answer there is something like #61. That is, by default, we don't make binaries get large by trying really hard on every literal. Only the statements or expressions which are marked as "important" will get the extra effort.

not having to rely on crypto packages, which does mess with our package dependencies

I hadn't seen this issue manifest into a real bug before, but now it's confirmed; I've filed #62 for that.

capnspacehook commented 4 years ago

I think it's fine if we don't obfuscate all literals by default, but we should certainly provide an easy way to do that without requiring some form of annotation of all literals. In my mind, special comments/no-op functions that annotate how garble should behave on a per-node basis could be extremely powerful and useful, but we need to keep the option to obfuscate all literals/nodes.

Annotations could be very useful for instance to override default Garble behavior, to force Garble to rename an exported method the user knows can be renamed for example.

lu4p commented 4 years ago

I will try to replace the aes function with a simple XOR function, which implements a similar interface like this:

type EncryptBlock interface {
  Encrypt(data []byte) *ast.BlockStmt
}

type XorBlock struct{}

func (*XorBlock) Encrypt(data []byte) *ast.BlockStmt
// random xor
// return decryptor as ast
}
mvdan commented 4 years ago

@capnspacehook Yeah, I think setting up defaults to affect all nodes would be useful. That is, unless the setting is overriden for any of the nodes.

The question is - how would you expose such global defaults? Flags would work, but they're a bit clunky. Perhaps something like Go annotations in an init func in a garble.go file in the main package? That would be pretty easy to discover and we wouldn't have to translate every option to a flag.

lu4p commented 4 years ago

~Only change is that it will return a *ast.CallExpr, and the functions aren't exported.~

pagran commented 4 years ago

Proof-of-concept only

two blocks participated in obfuscation, each block has the concept of "complexity", the more complex the block, the higher the complexity (heh):

  1. Arithmetic - 1
  2. Loop - 5

Original code

package main

func main() {
    println("hello world")
}

Obfuscated 1 line "manually" with different maximum complexity (see file names) https://gist.github.com/pagran/8210d187cf9ad8d95c65bb50a8f86c54

pagran commented 4 years ago

Generator source: https://github.com/pagran/go-random-encrypt-generator/blob/master/main.go

capnspacehook commented 4 years ago

I think this is resolved as well, should this be closed?