Floating Point - ISA and the IEEE Standard

JonathanSalwan commented 6 years ago

Notes for myself: What about using soft-float? For example, we could bind the symbolic emulation on real native opcodes.

hexpell commented 2 years ago

What happened to the idea of using soft-float? Is FP support still on the roadmap or do you need help in implementation?

JonathanSalwan commented 2 years ago

Yeah one day we will implement this. Probably on the same concept as our stubs. Do you want to give it a try?

SweetVishnya commented 2 years ago

Anyway, we should properly benchmark it. IEEE standard can produce complex constraints in the path predicate. It reminds me of symbolic pointers reasoning. We were trying hard to make it enabled by default. However, it slows down DSE in order of magnitude. So, we just enable it as option every N run.

JonathanSalwan commented 2 years ago

Yeah for sure it will complexify the path predicate. But the good thing with stub is that it's the user who call them where and when he want. For fuzzing i'm not sure it's useful to cover floating instructions but for deobfuscation it makes a sense a lot.

JonathanSalwan commented 2 years ago

@SweetVishnya if you are afraid to add more complexity in path predicate when supporting floating point instructions, one thing that can be done is adding a new mode llike FLOATING_POINT and then in x86 semantics:

void x86Semantics::xxx(triton::arch::Instruction& inst) {
  if (/* FLOATING_POINT mode enabled */) {
    /* call stub */
  }
}

liondandelion commented 1 year ago

@JonathanSalwan Hi, i was wondering about floating point instructions and found this thread; i kind of want to try and implement this. Have i understood correctly: one needs to compile soft-float library functions, corresponding to FPU instructions; then, when instruction is processed, "call" (just continue to emulate instructions from this array of bytes?) this stub, and that's it? What about symbolic state? Surely i would need to create new expressions like in existing semantics? Or, because this is now "normal" emulation using "normal" instructions, this gets done for free?

JonathanSalwan commented 1 year ago

Hi @liondandelion,

corresponding to FPU instructions; then, when instruction is processed, "call" (just continue to emulate instructions from this array of bytes?) this stub, and that's it?

Yep that's the idea but I've never tried :P.

What about symbolic state? Surely i would need to create new expressions like in existing semantics? Or, because this is now "normal" emulation using "normal" instructions, this gets done for free?

Yep this should be done for free. The idea is that: when you have an FPU instruction, let's say the fadd instruction (i1), instead of processing() the given opcode, we will jump into the bytecode of the appropriate stub and continue the emulation of from there, which mean that we emulate a part of the soft float library. Once it's done, the return value of the stub should be somehow inserted into the destination register of the original i1 instruction.

What do you think? Could it be workable?

liondandelion commented 1 year ago

@JonathanSalwan Yes, i think it could! I've looked at Bochs and Qemu and found out that they based their FPU/SSE instruction emulation on soft-float, but they modified it. I liked Qemu's implementation the most, it seems to implement more exception flags for MXCSR, for example, and is still being developed. I think i am going to try it out. Is it okay to leave thoughts/questions here if i get stuck?

JonathanSalwan commented 1 year ago

yeah sure ! Thx a lot <3

liondandelion commented 1 year ago

@JonathanSalwan Hi again, so i've been trying to implement this stuff and everything seemed fine: instructions gave me correct concrete results, ASTs looked "sane" and i even got correct models. But than i started to do some actual (still very simple) testing, and found out that not everything is alright :)

consider instruction like add ecx, edx. If these registers hold values like 2 and 3 respectively, and edx is symbolic, than right after this instruction executes i can take AST from ecx and ask for a model on it, for example, i could ask for a value 7 and it will give me the right answer of edx == 5. But i can't do that when executing SSE instruction like addsd xmm0, xmm1 using softfloat stub. Most of the time it will give wrong results, probably because the AST in xmm0 is only a part of the picture, i need also to provide the solver with path constraints, which were created during stub emulation. Seems like this information should somehow be carried in xmm0 AST too? Is it even possible?
even if i account for the path constraints, the model is still correct only in some cases. For example, instruction cvtsi2sd xmm0, ecx, which converts 32-bit integer to a double - i implemented it using i32_to_f64() function from softfloat (well, from Bochs version, but they are practically the same). If after the instruction executes (provided ecx is symbolic and concrete value is 5) i ask for models with condition like xmm0_ast == x.0 ranging from -8.0 to 8.0, i get proper results ranging form -8 to 8. But after that the solver behaves strangely, e.g. it will give unsut for 9.0, and it will give 10 for 12.0, which is clearly wrong. My guess is that some doubles that i ask the solver for are not possible to get within path constraints created during stub execution. So to get proper results, i would need to pass to solver not only resulting AST from xmm0 and collected path constraints, but also every possible combination of PCs by reverting branches? And all of this should still be carried in xmm0 AST?

Does someone have any ideas? Am i right in my assumptions? Is there something that can be done to fix it?

JonathanSalwan commented 1 year ago

so i've been trying to implement this stuff and everything seemed fine: instructions gave me correct concrete results, ASTs looked "sane" and i even got correct models. But than i started to do some actual (still very simple) testing, and found out that not everything is alright :)

That's an awesome start!

So to get proper results, i would need to pass to solver not only resulting AST from xmm0 and collected path constraints, but also every possible combination of PCs by reverting branches? And all of this should still be carried in xmm0 AST?

Am i right in my assumptions?

Mmh, that is right... For a complete and sound model you have to provide a constraint that represents all paths of the smtfloat stub executions -- which is a big issue =(. Currently your model is only valid for concrete values that take the same path that the one you used to build your path predicate. This is a big issue actually...

Actually, I've the feeling that to provide a complete and sound constraint, we have to semantically represent the behaviour of cvtsi2sd (for example) in only one SMT expression =(

intrigus-lgtm commented 4 months ago

I looked into extending Triton for Google CTF's ieee challenge (I don't think Triton would actually be able to solve the challenge even if it had supported all the instructions).

But I afaik Triton doesn't support float types in it's internal AST. If it did, I would think that using https://smt-lib.org/theories-FloatingPoint.shtml should work relatively well.

liondandelion commented 4 months ago

But I afaik Triton doesn't support float types in it's internal AST. If it did, I would think that using https://smt-lib.org/theories-FloatingPoint.shtml should work relatively well.

Yes, i had looked into this already before, and tested conversions between bitvector and floating point theories a little using plain Z3, it seemed to work. I intend to test this with bitwuzla too, since it is also supported by Triton, and try to implement these conversions in Triton itself. I have not done it yet, but probably will within a month or so.

intrigus-lgtm commented 1 month ago

@liondandelion hey, did you have a chance to implement it?

JonathanSalwan / Triton

Floating Point - ISA and the IEEE Standard #326