HarryR / ethsnarks

A toolkit for viable zk-SNARKS on Ethereum, Web, Mobile and Desktop
GNU Lesser General Public License v3.0
240 stars 57 forks source link

`test_sha256_full_gadget` fails on Travis OSX #27

Closed HarryR closed 5 years ago

HarryR commented 6 years ago
./bin/test_sha256_full_gadget || true
full_output_bytes mismatch
Expected: : 0094F6E585874FE640BE4CE636E6EF9E3ADC27620AA3221FDCF5C0A7C11C6F67
Actual: : D294F6E585874FE640BE4CE636E6EF9E3ADC27620AA3221FDCF5C0A7C11C6F67
FAIL

It seems that the expected is incorrect - the first byte is null, however it should be D2 - so it's working as expected, but there's something buggy going on which causes the check to fail or a null byte to appear.

drewstone commented 6 years ago

Documenting my progress on this issue.. currently, it seems that generating the left hand side of the witness is messed up on mac. Using the following additions to the code (basic logging) from lines 67-68 of test_sha256_full_gadget.cpp:

    left.generate_r1cs_witness(left_bv);
    right.generate_r1cs_witness(right_bv);

    auto left_bits = left.get_digest();
    uint8_t left_bytes[SHA256_digest_size_bytes];
    bv_to_bytes(left_bits, left_bytes);

    auto right_bits = right.get_digest();
    uint8_t right_bytes[SHA256_digest_size_bytes];
    bv_to_bytes(right_bits, right_bytes);

    print_bytes("LEFT", SHA256_digest_size_bytes, left_bytes);
    print_bytes("RIGHT", SHA256_digest_size_bytes, right_bytes);

Here's the output from my mac

LEFT: 0086D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08
RIGHT: 9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08

Here's the output from my linux (running Ubuntu 18.04 on a Microsoft Surface 4)

LEFT: 9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08
RIGHT: 9F86D081884C7D659A2FEAA0C55AD015A3BF4F1B2B0B822CD15D6C15B0F00A08
HarryR commented 6 years ago

The weird thing is the output_expected variable variable that the test is failing on is the hard-coded constant, which seems like it's getting modified somewhere between line 61 and line 89, whereas the digest being computed by the circuit is correct.

I've run valgrind on the Linux build and I'm not seeing any memory writes to wrong places, it may be worth running it on the Mac build to see if anything gets flagged.

It would also be worth verifying the as bits using print_bv before and after generating the witness.

I tried adding the const flag to the output_digest, but that hasn't fixed it - so, just to get this straight - I have a const uint8_t output_expected[] = {... where the first byte is getting zero'd out at some point.

If the sha256_full_gadget_512 is removed, along with full_input and full_output variables does this still occur? I'm worried that there's some kind of overflow or compiler weirdness happening under the hood.

See commit at: https://github.com/HarryR/ethsnarks/commit/debcb932eb231923264e07cddeb1ec0f6e629ad1

drewstone commented 6 years ago

So before generating witnesses, left_bv does equal right_bv. Otherwise, I'm stuck since the same operations are applied to the right side which isn't getting goofed up. I'll be diving in again later.

jheinnic commented 5 years ago

I think I may have a fix for this issue by serendipity. I'd been looking at the test_hashpreimage executable, which completes successfully insofar as it gets to the point where it prints "OK", but immediately after that the executable faults with "Abort trap: 6", indicating a write outside of allocated memory.

I tracked that back through trial and error to line 52 of src/test/test_hashpreimage.cpp:

bv_to_bytes(full_output_bits, full_output_bytes);

I was able to work around the issue by commenting out this instruction and preventing the comparison errors that follow from exiting the test case, since that instruction is what populates the byte array it compares the expected result to. The remaining test logic that follows proceeds without incident, and the abort trap no longer occurs.

bv_to_bytes is from src/utils.cpp and calls out to another method in the same file, and this is where I found the root cause of the abort trap 6 problem:

std::vector<unsigned long> bit_list_to_ints(std::vector<bool> bit_list, const size_t wordsize)
{
    std::vector<unsigned long> res;
    size_t iterations = bit_list.size()/wordsize **+ 1**;

    for (size_t i = 0; i < iterations; ++i)
    {
        unsigned long current = 0;
        // ... Code trimmed for brevity
        res.push_back(current);
    }

    return res;
}

Imagine that we have a list of 256 bits that we want to copy to an array of 32 bytes. The word size will be 8, since there are 8 bits per byte. The code above will calculate 33 for the iteration count and then attempt to populate values in the vector indexed from 0...32. Vectors use zero-based indexing, so we really only want 32 iterations spanning from 0...31.

Dropping the + 1 from the calculation of iterations allowed me to back out of the hacking I'd done while finding my way here. I put back the call to bv_to_bytes() and uncommented out the two return statements in the if block that follows. With bv_to_bytes() in the code path, the comparison logic was once again able to compare actual to expected and find a match, but since the utility was no longer copying one byte past its allocated region, the unwanted Abort Trap: 6 did not return.

I found this issue while looking to see if there was an existing issue to prepare a pull request against and tried running test_sha256_full on a hunch. I am certain it was failing for me before I'd touched anything else, and the only change in my repository from master now is the removal of + 1 from bit_list_to_ints() assignment to its iterations counter, but the test program now returns OK.

It would probably be wise to observe why this issue was also causing the failure in test_sha256_full, particularly since it manifested in a nastier red test case rather than just faulty exit code after passing the test logic.

I'll have a pull request offering momentarily. It's small, but I've also already forked the base for this repository, so am learning how to do a cross-fork pull request for the first time.

HarryR commented 5 years ago

Let me digest this, but... the Abort trap 6 bug was also causing the OSX Travis CI builds to fail, and I haven't had the time to look into this yet, but it seemed like a bug in a destructor somewhere.

I've been compiling with both clang and gcc, and running valgrind on the builds on Linux to check for out of bounds accesses etc, but it's not very strict.

Thank you for taking the time to look into this, I really appreciate it.

HarryR commented 5 years ago

:D

HarryR commented 5 years ago

I guess the downside from this is that I introduced this bug somehow, and it was only caught because of multi-platform tests, and for that I feed bad :(

I will ponder on how to avoid introducing these kinds of bugs in future, and possibly it may just mean paying more attention while programming and making sure I have more thorough tests for edge cases etc.