JonathanSalwan / Triton

Triton is a dynamic binary analysis library. Build your own program analysis tools, automate your reverse engineering, perform software verification or just emulate code.
https://triton-library.github.io
Apache License 2.0
3.48k stars 529 forks source link

Improve the taint analysis #908

Closed JonathanSalwan closed 2 years ago

JonathanSalwan commented 4 years ago

I don't like the current taint analysis, I will probably rewrite the taint engine. If you have some recommendation / discussion, feel free to comment this thread. Other threads taking into account:

SweetVishnya commented 4 years ago

In #888 we decided to check whether instruction is symbolized like follows:

  1. Get instruction operands from DynamoRIO.
  2. Ask Triton if any registers or memory regions are symbolic.
  3. If none of them are symbolized, then skip instruction processing (building AST).

But we didn't have time to try it out yet. Also, we don't need taint at all for this solution.

On the other hand, if one does not have a concolic engine, he should consider some mechanism to skip non-tainted instructions. Triton engine actually takes a lot time to build ASTs. IMHO, there should be a lightweight taint to select instructions for symbolic interpretation (reasoning/execution). But there is a problem that you need to some how update a concrete state anyway.

One more solution as you mentioned earlier is to do taint on AST level. Then you need some code to check if implicit operands are symbolic. Or you don't need an extra code as you have all expressions for an instruction.

SweetVishnya commented 4 years ago

As far as I see now, taint engine is useless if you don't have an emulation (concolic execution).

SweetVishnya commented 4 years ago

What is your use case model for taint?

archercreat commented 4 years ago

Ability to track multiple sources would be good.

SweetVishnya commented 4 years ago

Ability to track multiple sources would be good.

Do you want taint colors to distinguish multiple inputs?

archercreat commented 4 years ago

Ability to track multiple sources would be good.

Do you want taint colors to distinguish multiple inputs?

Yes.

JonathanSalwan commented 4 years ago

What is your use case model for taint?

Actually, I never use the taint engine. I mainly use symbolic variable (we can consider that it's a kind of taint) and modes ONLY_ON_SYMBOLIZED, ALIGNED_MEMORY. Then, asking for a model we can deduct:

JonathanSalwan commented 4 years ago