Start conversion to easy-smt

sampsyo commented 1 year ago

As discussed on Slack, here's a first attempt to move from rsmt2 to easy-smt. Here are the high-level takeaways:

The actual interaction with the solver was easy to move over. No problem! Just had to fiddle around with the APIs to line them up.
The real work here is the drudgery of changing from plain ol' strings for the SMT expressions to the library's SExpr type.

I got everything in solver.rs converted, including all the SMT generation and the interaction with the solver. This took me about 3 hours to move everything over—almost all of that was replacing calls to format! with explicit S-expression construction calls. It's not exactly fun, but it's not exactly hard either.

There is more drudgery ahead: the files in encoded_ops have a lot of raw SMT strings in them and will be a lot of work to convert.

There is a chance I have made a terrible mistake and I should not have tried to do all this annoying conversion at all: instead, we could try to parse the strings we're already generating and turn them into SExprs. Or we could attempt to hack easy-smt to allow us to shove strings directly into the solver, which it doesn't currently allow (you have to provide an SExpr). Both of those seem very reasonable in retrospect, but now I wonder if either of them would be simpler than just forging ahead and finishing the conversion. I'd be interested in others' opinions, given what the code looks like in solver.rs after the conversion.

sampsyo commented 1 year ago

As we discussed synchronously today, I've mucked things up so that our crate builds. That meant temporarily:

removing the files under encoded_ops
replacing the calls into their SMT-generating functions with todo!()

It works! I haven't tried running any tests, because I don't feel qualified to know which tests should be runnable in this semi-broken state. (I could probably figure that out, but I'm going to move on to trying to rewrite the encoded ops stuff instead of spending time on that.)

sampsyo commented 1 year ago

In the name of expediency—that is, to get something actually working as quickly as possible, so we can iterate on it—I had to change tactics a bit. The original plan had been to port the generation code from Python to Rust, so we could generate S-expressions directly (instead of generating Rust that generates S-expressions). But what I didn't quite realize is that the "source of truth" is in SMT-LIBv2 code already, and that the Python code acts as a translator for that code.

In light of that, I saw two fast routes to getting something working:

Still port the Python to Rust, but that Rust code would work by parsing the SMT-LIBv2 files (just as the current Python does).
Keep the status quo (Python generates Rust that generates S-expressions), but change the generator to produce Rust code that produces easy-smt SExpr values instead of strings.

I opted for option 2 because getting an S-expression parser going on the Rust side seemed like a big drag. We may want to revisit this later.

Logistically, my goal was to keep the "source of truth" in the original templatized SMT-LIBv2 files—not in the generated Rust code. With that philosophy, if we need to fix a bug in the encoding, we never fix the generated Rust; we fix the original SMT S-expressions (or the Python converter) and then just regenerate the Rust. The goal, then, is to have no (or at least minimal) human massaging after conversion. To that end, I did these things:

Checked in the input SMT files and the Python conversion script into git (see the encodings directory).
Hacked the Python script to produce easy-smt-based generator code instead of the string-based stuff.
Organized the SMT files so we have a 1:1 mapping between those files and the Rust functions that wrap them. That means that I split up a64cls32 from cls32 so it has its own file, and I added cl*1 files (even though they are super simple and you could just write them directly in Rust instead).
I tried to delimit the generated code the Rust files in such a way that it is hopefully easy to update. You just need to do python3 convert.py cls16.smt2 | pbcopy, for example, and paste this into the corresponding Rust function body.

Here are some loose ends to address next:

I wasn't quite able to grok where the SMT-LIBv2 code lives for two oddball functions: rbit32 and a64clz32. Perhaps @mpardeshi has these sitting around?
We should probably add annotations to quell Clippy for generated code (it's not meant for human consumption anyway)?
There were several cases where the current Rust code didn't seem to correspond exactly to the original SMT-LIBv2 sources I was working from, so I had to make some guesses. It is extremely likely that I broke something when doing that, so we need to check whether any of these functions still actually work.

One thing I'm not sure about is: What is the best way to check whether these encodings are working at all? Even if it's not an automatable test, being able to check whether these encodings work in isolation seems really handy… especially if it can happen separately from the rest of the verification machinery.

Less-urgent things that would be good to do but not in this PR include:

The generated SExpr-construction code could be improved. It doesn't use as many helpers as it could, instead opting to use mostly the low-level utilities.
We could write some tests. (I started writing one test for rev1, the simplest possible thing, and then abandoned it.)
We could make things much more convenient without rocking the boat by generating entire Rust files from scratch (including function definitions and extract/concat calls) so no humans are involved at all to copy & paste.
We could consider ripping this Python approach out entirely and replacing it with direct parsing of the SMT-LIBv2 sources in Rust.
Or we could, at some point, instead shift the "source of truth" to the generated Rust. We would throw away the SMT-LIBv2 sources and consider the Rust code to be canonical. Maybe this makes it way more annoying to tweak; I'm not sure.

mpardesh commented 1 year ago

Woohoo that was super fast!!!

Here are some quick, minor comments:

I think rev32 should be in the Slack thread
Ah yes, I think I couldn't find a64clz32 but I'll look more and add it to the thread
Sorry, should've mentioned I didn't make files for the 1 bit functions because they were just a line or two of manual Rust
To test: you can check for equivalence of the raw smt and C test file (with some input size changing in C). Beyond that just used the regular tests in veri_engine/examples

sampsyo commented 1 year ago

Awesome; thanks! As for testing specifically, I suppose I want to try setting up something (semi-)automated, since I am not very good at getting the details right when manually setting things up for one-off tests… still not 100% sure what that would look like. Maybe running the solver from an isolated Rust test?