AleoNet / snarkVM

A Virtual Machine for Zero-Knowledge Executions
https://snarkvm.org
Apache License 2.0
1.01k stars 1.46k forks source link

[Bug] Not all threads terminate prior to `main` when executing certain unit tests #2245

Open icmccorm opened 7 months ago

icmccorm commented 7 months ago

🐛 Bug Report

I've been developing an experimental version of Miri that can execute foreign functions by interpreting LLVM bytecode. I was testing it on this crate, and it didn't find any borrowing violations, but Miri did report that several threads were still running after the main thread terminated.

Steps to Reproduce

The custom version of Miri that I'm using is public, but to use it, you'll need to build Rust from source. If you take this route, the following command will reproduce the error:

MIRIFLAGS="-Zmiri-llvm-read-uninit" cargo miri test -- [test_name]

Here's the error message I see:

error: the main thread terminated without waiting for all remaining threads

This occurs for the following tests:

Expected Behavior

All other threads should terminate prior to the main thread.

Your Environment

raychu86 commented 1 week ago

Thanks for finding this. Since this only affects testing, I will update the label for this issue.

ljedrz commented 6 days ago

This is unrelated to the issue at hand, but it's something that I've noticed while looking into this; the test_roots_of_unity case exists both in fft::tests and fft::domain and it's identical.

ljedrz commented 6 days ago

I've looked at the listed cases and I don't see any issues at first glance. There are multiple possible explanations, though; rayon might have been a bit slow with its cleanup, or perhaps the miri flag used was not stable enough just yet? BTW, I can't find it referenced anywhere, is it part of mainline miri yet?

@icmccorm can you please advise whether miri still complains with one of the listed tests? Also, why would a custom compiler build be needed to run it? Is it some specific branch (otherwise rustup +nightly component add miri should suffice)?

icmccorm commented 6 days ago

The extension requires forks of Miri and the LLVM toolchain, so we had to use a different compiler build to avoid symbol clashes. This is still a research experiment, and we haven't been collaborating with the Miri team, but we've found some interesting undefined behavior so far. Hopefully our results will motivate creating a production-ready extension down the line.

I'll see if I can reproduce this—it's possibly just a false-positive caused by our extension, since we use Miri's multithreading mechanism to transfer control of execution between Miri and the LLVM interpreter.

icmccorm commented 10 hours ago

I ran a couple of the tests I mentioned above using an updated version of the tool, and I'm seeing the same error for each one.

I can try adding some debugging to determine if it's one of the FFI threads that isn't being terminated, which would likely be an error with the tool.