Closed travisdowns closed 7 years ago
I should add that a small tweak to the C++ code results in a swap to using the tzcnt
instruction:
xorl %eax, %eax # 1 0x400570 2 OPC=xorl_r32_r32
xorl %edx, %edx # 2 0x400572 2 OPC=xorl_r32_r32
tzcntl %esi, %eax # 3 0x400574 4 OPC=tzcntl_r32_r32
subl $0x1, %esi # 4 0x400578 3 OPC=subl_r32_imm8
testl %edi, %esi # 5 0x40057b 2 OPC=testl_r32_r32
shrxl %eax, %edi, %edi # 6 0x40057d 5 OPC=shrxl_r32_r32_r32
setne %dl # 7 0x400582 3 OPC=setne_r8
leal (%rdx,%rdi,1), %eax # 8 0x400585 3 OPC=leal_r32_m16
retq # 9 0x400588 1 OPC=retq
This code doesn't have the same issue (tzcnt
doesn't have the undefined behavior and in any case the regs are zeroed with xor first), and stoke doens't complain.
I'm guessing the complaint is something to do with the fact that bsrl has undefined behavior when the input reg is 0, and the current intel behavior is to leave the reg unchanged.
I haven't looked at the code, but this sounds like the right explanation to me. In general, STOKE needs to make sure that the programs it generates do not perform reads on potentially undefined data. In the most general situation, this is an undecidable problem. So, we have a very simple analysis that approximates the solution, but sometimes it gives false positives, especially when whether something is defined or undefined depends on instruction semantics or the processor implementation.
You can try the --live_dangerously option, which will skip the startup check, but you'll have the problem that STOKE won't accept any rewrite which it thinks does an undefined read, so this probably won't help you. For your specific problem, I would recommend modifying the x86.csv in x64asm so that bsrl is considered as doing a write rather than leaving a register undefined. Could that work for you?
It could work, but I'm also confused as to why stoke needs the program at all in the synthesize
step? I would have thought it works only on the test case input (which never contains any 0 input, so should never trigger the undefined behavior). Understanding what stoke actually uses the program for, when I'm doing synthesis and using "holdback" validation would help me understand better how to work around it.
The problem with modifying the x86.csv is that then the guarantees offered by the instruction would be stronger than the CPU implementation (i.e., I need to choose some behavior for input = 0 - say output = 0?), so stoke may get stuck finding solutions that rely on this behavior (e.g., using the fact that bsf(0) == 0
). I suppose one workaround to that could be setting the value for bsf(0)
to be something very random that is not likely to be useful in a solution and then verifying that this isn't used by the solution.
I suppose another workaround is just to zero out the destination register before the bsr
call - can I freely modify the assembly in the bins/
dir (created by extract)? That is, are the other columns: # 1 0x400570 3 OPC=leal_r32_m16
used by stoke or are they informational only?
It could work, but I'm also confused as to why stoke needs the program at all in the synthesize step?
As @bchurchill pointed out, STOKE uses a simple data-flow analysis that ensures we only ever deal with programs that are guaranteed to not read undefined values. Getting a precise and fast analysis is difficult, so ours is an over-approximation that works reasonably well in most cases, but means STOKE can essentially NOT use the bsr instruction at all (because it can NEVER prove that the program doesn't read an undefined value, unless it just doesn't read the main output, which would not be so useful). This is unfortunate, but for now we don't have any real work-around.
The closest to a work-around is changing the x86.csv file, but as you note, that means STOKE will now use bsr even for 0 inputs.
I suppose another workaround is just to zero out the destination register before the bsr call - can I freely modify the assembly in the bins/ dir (created by extract)? That is, are the other columns: # 1 0x400570 3 OPC=leal_r32_m16 used by stoke or are they informational only?
Yes, you can modify those. The columns are not purely informational, they are used by the parser in some special cases, but if you remove them and modify the assembly, everything should work. Just make sure when you modify the code that you remove the comments (the parser doesn't NEED them, but will use them if they are present).
Essentially the parser uses the comments to dissambiguate between instructions that have the same string but can be representated in multiple ways (e.g. an instruction that adds the constant $0x1 to %rax can be represented by an instruction that uses an 8, 16 or 32 bit constant).
Can I leave the comments intact for the existing instructions, when I add my xor eax, eax
(which won't have any comments).
Yes.
Thanks. I'm still curious what stoke uses the target program when synth/optimizing with "holdback", with --init zero
. Is it only to calculate the cost to beat?
It is used for calculating the cost, yes (in particular to calculate the correct output, so the "correctness" term in the cost).
... but I thought that only the existing test cases only were used for the correctness calculation. I.e., that the correctness score is the sum (or other reduction) of the errors across all the applicable test cases, and it doesn't know more about the function than that (when using "holdback" validation strategy).
Does the synthesize
step actually re-run the target function as part of the correctness calculation? This difference is important, because I may craft my test cases to avoid certain invalid inputs, and want a rewritten function that works for the remaining inputs, but I don't care what happens when invalid inputs are used. Furthermore, I may modify the target function (e.g., adding the xors or something else) and I want to know if stoke is using it.
I get that it will use it if you use the other proof-based validations, etc.
Test cases are only test inputs, so you need the target to determine what the correct output is. The "correctness" part of the cost function takes the test cases, runs both the target and rewrite on those inputs and compares the outputs (only live out).
Ah yes, I had already noted at some point that the test cases didn't define any output, but somehow forgot that fact!
On Nov 9, 2016 2:10 PM, "Stefan Heule" notifications@github.com wrote:
Test cases are only test inputs, so you need the target to determine what the correct output is. The "correctness" part of the cost function takes the test cases, runs both the target and rewrite on those inputs and compares the outputs (only live out).
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/StanfordPL/stoke/issues/939#issuecomment-259513379, or mute the thread https://github.com/notifications/unsubscribe-auth/ACSswViUPtkLhgKxRsOIsDxGzqEJscW1ks5q8iiXgaJpZM4KtNaX .
I'll go ahead and close this -- feel free to open a new issue if you're having other problems.
Well I am still having this problem...
I'm trying to optimize a simple function with the following disassembly:
stoke synthesize fails almost immediately with:
However,
esi
is written by te earlierbsrl
instruction. I'm guessing the complaint is something to do with the fact thatbsrl
has undefined behavior when the input reg is 0, and the current intel behavior is to leave the reg unchanged. That doesn't really explain it all though because actually both src and dest for thebsrl
areesi
which is in my def in:--def_in "{ %rax %rdi %rsi }"
.