While reconstructing control flow of a virtualized function, one of the main roadblock is constant unfolding. If we need to dumb it down, basically the constant we need is encrypted and the VM handle is decrypting the constant.
imagine
mov rax, 1337
add rax, 50
xor rax, 7331
jmp rax
its much harder to read than
mov rax, 6600
jmp rax
the problem is, we dont want to our precious time running constant propagating, lucky for us LLVM has a solution for us, while generating IR, it propagates for us:
These will never get propagated on codegen, however they will be propagated with usage of optimization passes, expecially early-cse. Well, that if memory load is not using IntToPtr. You dont want to use inttoptr.
*While a "fix" is implemented, its not the best solution and it requires work. The idea here is creating a map to track memory usage and if we are trying to load that memory, instead of CreateLoad, we just move the value from map. However, this is not a great solution because:
1-if we dont know where we store, potentionally it could store everywhere, rendering useless our constants (however this has a low chance of happening)
2-We will need to store the memory state while we are on a different branch, say
mov [100],100
je s2
s1:
mov [100],200
ret
s2:
mov rax,[100]
ret
here since we only discover one branch at time, when we first move 100 to [100], then we move 200, to [100]. Now when discovering s2, we need to return to original state, where we moved 100 to [100], so we dont move an incorrect value to rax.
While reconstructing control flow of a virtualized function, one of the main roadblock is constant unfolding. If we need to dumb it down, basically the constant we need is encrypted and the VM handle is decrypting the constant.
imagine
its much harder to read than
the problem is, we dont want to our precious time running constant propagating, lucky for us LLVM has a solution for us, while generating IR, it propagates for us:
this code will generate
instead of
so instead of implemeting propagation for each instruction, we can mostly rely on micro propagation it does on codegen.
However, there are some exceptions to that:
1- memory loads* 2- calls (instrincts needed for some instructions, can also implement them manually)
These will never get propagated on codegen, however they will be propagated with usage of optimization passes, expecially early-cse. Well, that if memory load is not using IntToPtr. You dont want to use inttoptr.
otherwise... this happens. https://godbolt.org/z/aK7MEfvPd https://godbolt.org/z/7eeTn6hxP
*While a "fix" is implemented, its not the best solution and it requires work. The idea here is creating a map to track memory usage and if we are trying to load that memory, instead of CreateLoad, we just move the value from map. However, this is not a great solution because: 1-if we dont know where we store, potentionally it could store everywhere, rendering useless our constants (however this has a low chance of happening) 2-We will need to store the memory state while we are on a different branch, say
here since we only discover one branch at time, when we first move 100 to [100], then we move 200, to [100]. Now when discovering s2, we need to return to original state, where we moved 100 to [100], so we dont move an incorrect value to rax.