Closed stefanheule closed 8 years ago
This test has not been failing before build 239, and that's when @bchurchill merged in a lot of his changes: http://mrwhite.stanford.edu:8080/job/stoke-develop-full/239/ @bchurchill: Anything in those changes that might cause this?
Nothing comes to mind. These changes should have been all validator changes.
Thinking back though, almost any time we've had a fuzzer error for just one instruction where the sandbox appeared to have been doing something wrong (be it the validator's fuzzer or otherwise), the culprit has been the assembler most of the time. The sandbox is sometimes the culprit, but rarely so for register-register vector instructions where nothing special is required.
So, perhaps check the assembly of the instructions we're generating against what 'as' generates or check with objdump? Especially check the vex prefix, since that would cause this exact problem.
Berkeley
On 01/08/2016 10:52 AM, Stefan Heule wrote:
This test has not been failing before build 239, and that's when @bchurchill https://github.com/bchurchill merged in a lot of his changes: http://mrwhite.stanford.edu:8080/job/stoke-develop-full/239/ @bchurchill https://github.com/bchurchill: Anything in those changes that might cause this?
— Reply to this email directly or view it on GitHub https://github.com/StanfordPL/stoke/issues/801#issuecomment-170089254.
Ok, we have another weird, possibly related test failure:
[----------] 4 tests from ValidatorPcmpeqbTest
[ RUN ] ValidatorPcmpeqbTest.AllZeros
././tests/validator/common.h:214: Failure
Failed
Sandbox and validator do not agree for 'vpcmpeqb %ymm0, %ymm1, %ymm2' (opcode vpcmpeqb_ymm_ymm_ymm)
states do not agree for '%ymm2':
validator: 0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0xff₈ ∘ 0xffffffffffffffff₆₄)))))))))))))))))))))))
sandbox: 0x0₆₄ ∘ 0x0₆₄ ∘ 0xffffffffffffffff₆₄ ∘ 0xffffffffffffffff₆₄
[ FAILED ] ValidatorPcmpeqbTest.AllZeros (92 ms)
[ RUN ] ValidatorPcmpeqbTest.OneMatch
././tests/validator/common.h:214: Failure
Failed
Sandbox and validator do not agree for 'vpcmpeqb %ymm0, %ymm1, %ymm2' (opcode vpcmpeqb_ymm_ymm_ymm)
states do not agree for '%ymm2':
validator: 0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ 0x0₆₄)))))))))))))))))))))))
sandbox: 0x0₆₄ ∘ 0x0₆₄ ∘ 0x0₆₄ ∘ 0x0₆₄
[ FAILED ] ValidatorPcmpeqbTest.OneMatch (93 ms)
[ RUN ] ValidatorPcmpeqbTest.WordMatch
././tests/validator/common.h:214: Failure
Failed
Sandbox and validator do not agree for 'vpcmpeqb %ymm0, %ymm1, %ymm2' (opcode vpcmpeqb_ymm_ymm_ymm)
states do not agree for '%ymm2':
validator: 0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ 0x0₆₄)))))))))))))))))))))))
sandbox: 0x0₆₄ ∘ 0x0₆₄ ∘ 0x0₆₄ ∘ 0x0₆₄
[ FAILED ] ValidatorPcmpeqbTest.WordMatch (92 ms)
[ RUN ] ValidatorPcmpeqbTest.SeveralMatch
././tests/validator/common.h:214: Failure
Failed
Sandbox and validator do not agree for 'vpcmpeqb %ymm0, %ymm1, %ymm2' (opcode vpcmpeqb_ymm_ymm_ymm)
states do not agree for '%ymm2':
validator: 0x0₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ (0xff₈ ∘ (0x0₈ ∘ 0xff00ff00ff00ff00₆₄)))))))))))))))))))))))
sandbox: 0x0₆₄ ∘ 0x0₆₄ ∘ 0xff00ff00ff00₆₄ ∘ 0xff00ff00ff00ff00₆₄
[ FAILED ] ValidatorPcmpeqbTest.SeveralMatch (92 ms)
It seems that again the sandbox is wrong and computes the lower 128 bits correctly, but incorrectly leaves the upper bits unchanged (instead of computing them). Again, I can't reproduce the error locally on mrwhite (the test passes).
It seems that this might be something serious and we need to find the cause, but I'm not sure how yet as I'm not able to reproduce it.
Also, it's not the assembler: It spits out the same bits as gcc does.
I think I know what's happening.
At least for the PcmpeqbTest, these are being run as part of nehalem_test. On a real Nehalem machine, we would crash. On mrwhite, the instructions get run fine, but the sandbox ignores the ymm registers because it figures they wouldn't exist on Nehalem.
The failure in Jenkins build 246 on X64AsmTest.SpreadsheetReadWriteSetFuzzTest is also Nehalem test. So, that's the likely culprit.
I think this is essentially a duplicate of #714 (in that, the bugs we're finding are specific to Nehalem/Sandy Bridge platform)
and, of course, if you're not building nehalem_test, you wouldn't be able to reproduce these.
Oh, I see. Sounds like we need to fix the tests, then.
Would you be willing to do that? It's hopefully just #def'ing something.
I think I fixed both problems. For the first one, we did not remove a CPU flag in the nehalem build that isn't available on nehalem. For the second one, I excluded the test using a #ifndef.
Not entirely sure why these failures didn't show up earlier.
It seems that in the test, the sandbox computes the lower 128 bits correctly, but leaves the upper bits unchanged, instead of setting them to 0. When I try to run the very same state and same instruction locally with the sandbox, it works correctly (the upper bits are correctly set to 0).
Huh?
Full output of the fuzz tester: