Open ledwards2225 opened 8 months ago
Collecting more information that is possibly relevant: I saw this CI failure (trying to invert zero in the Translator composer tests) on an entirely unrelated branch. Re-running the test made it pass. This could be a coincidence but it is notable that the failure was again in the Translator and that it was non-repeatable.
Background: As part of our 2023 goals, we hooked up Goblin to ACIR. This essentially meant constructing and verifying GUH proofs over acir-generated circuits, and also constucting and verifying ECCVM and Translator proofs for aribitrary ECC ops that were unrelated to the circuit in question. (This latter component was essentially there to work out the interfaces and have a proof of concept). This was encapsulated in a new method
proveAndVerifyGoblin
. At the end of 2023, PR #3636 had things working for only a small subset of the acir tests (only one of which was run on CI). A follow on PR #3757 made all of the acir tests pass, however, we observed intermittent and non-repeatable failures seemingly related to some kind of memory bug. The failures were reproducible within the same environment (mainframe or CI) but not across environments and were dependent on print statements and whether or not tests were run in sequence or not. This latter point was particularly odd since the manner in which the tests are run should make them completely isolated from one another (as opposed to running several tests in the same process in gtest, for example). The typical error was a "Trying to invert zero in the field", anecdotally in ZM for the Translator. A failure was never observed when running any test in isolation. An example stack trace from a failing test is provided at the bottom of this description.The workaround was simply to remove the ECCVM/Translator portions from the testing. This is actually natural since in practice these Goblin components only come into play for recursion, not for single proof construction verification. Also, the ops being processed by ECCVM/Translator in each test were completely arbitrary. At the time of writing, we simply run all of the acir tests for Ultra Plonk and Goblin Ultra Honk (GUH).
Backtrace from a failing test: (Note: the failing test was consistent for a consistent code config but would change seemingly arbitrarily with an arbitrary code change. I would not expect to be able to reproduce the failure on this test in particular).