Improve the taint analysis

JonathanSalwan commented 4 years ago

I don't like the current taint analysis, I will probably rewrite the taint engine. If you have some recommendation / discussion, feel free to comment this thread. Other threads taking into account:

473
827
888

SweetVishnya commented 4 years ago

In #888 we decided to check whether instruction is symbolized like follows:

Get instruction operands from DynamoRIO.
Ask Triton if any registers or memory regions are symbolic.
If none of them are symbolized, then skip instruction processing (building AST).

But we didn't have time to try it out yet. Also, we don't need taint at all for this solution.

On the other hand, if one does not have a concolic engine, he should consider some mechanism to skip non-tainted instructions. Triton engine actually takes a lot time to build ASTs. IMHO, there should be a lightweight taint to select instructions for symbolic interpretation (reasoning/execution). But there is a problem that you need to some how update a concrete state anyway.

One more solution as you mentioned earlier is to do taint on AST level. Then you need some code to check if implicit operands are symbolic. Or you don't need an extra code as you have all expressions for an instruction.

SweetVishnya commented 4 years ago

As far as I see now, taint engine is useless if you don't have an emulation (concolic execution).

SweetVishnya commented 4 years ago

What is your use case model for taint?

archercreat commented 4 years ago

Ability to track multiple sources would be good.

SweetVishnya commented 4 years ago

Ability to track multiple sources would be good.

Do you want taint colors to distinguish multiple inputs?

archercreat commented 4 years ago

Ability to track multiple sources would be good.

Do you want taint colors to distinguish multiple inputs?

Yes.

JonathanSalwan commented 4 years ago

What is your use case model for taint?

Actually, I never use the taint engine. I mainly use symbolic variable (we can consider that it's a kind of taint) and modes ONLY_ON_SYMBOLIZED, ALIGNED_MEMORY. Then, asking for a model we can deduct:

No symbolic expression assigned to the reg/mem = untainted
UNSAT = tainted but not controllable
SAT = tainted and controllable

JonathanSalwan commented 4 years ago

The taint analysis could be applied on a standalone static IR (#473)

JonathanSalwan / Triton

Improve the taint analysis #908

473

827

888