YosysHQ / oss-cad-suite-build

Multi-platform nightly builds of open source digital design and verification tools
ISC License
808 stars 72 forks source link

Update Bitwuzla/seems like Bitwuzla is not origin/main version? #87

Open gussmith23 opened 10 months ago

gussmith23 commented 10 months ago

I'm confused on which version of Bitwuzla is in oss-cad-suite. From digging in the scripts, it seems like bitwuzla is pulled from origin/main. But when I build origin/main bitwuzla and compare it against oss-cad-suite's, they have different versions:

$ ~/Downloads/oss-cad-suite/bin/bitwuzla --version
1.0-prerelease
$ bitwuzla --version                         
0.2.0-dev

(the second bitwuzla is built from source, pulled from main just today)

I'm confused at what's happening here -- any idea?

jix commented 10 months ago

It's building a fixed revision defined here:

https://github.com/YosysHQ/oss-cad-suite-build/blob/1a8a68fdbb7289baffb5e7a8c8ec53cf2073a549/default/rules/solvers.py#L37-L43

gussmith23 commented 10 months ago

Ah, I missed that!

What are the chances that can be updated?

mmicko commented 10 months ago

Main issue here is that they switched to meson for building (and must be newer version then distributed with ubuntu), so it would require some work to get it build even for linux-x64 and not sure about setting up cross compile with meson. But will put it on my TODO list.

gussmith23 commented 10 months ago

Understood. Appreciate it. I can also take a crack at it over the holidays if you give me pointers. Would make my life easier to have an updated version in the suite!

jix commented 10 months ago

I noticed that bitwuzla changed how it parses command line option and needs some tweaks to smtbmc to be compatible (basically the options that smtbmc passes to boolector and bitwuzla are not supported by a recent bitwuzla but are also not required anymore). It's probably best if we add version detection to smtbmc so it can work with either version.

I also noticed that, at least for what I've tried so far, a bitwuzla locally built from main has a significant performance regression over the bitwuzla we currently ship with as well as over boolector (either the oss-cad-suite version or a locally built one from master).

mpreiner commented 8 months ago

Thanks @gussmith23 for pointing me to this issue.

Some things to clarify on Bitwuzla and its versions: The current main branch of bitwuzla (and versions starting with 0.x) is a complete from-scratch rewrite of the solver, and does not use any code of the old code base. Which is also the reason why some things changed like build system and option parsing.

We never had an official release of bitwuzla with the old code base (which was an extended fork of Boolector), and it was always at version 1.0-prerelease. Now with the new code base we started versioning with 0.x releases until we have a stable version 1.0. @jix it'd be great if you could share the benchmarks on which bitwuzla 0.x is slower than 1.0-prerelease.

jix commented 8 months ago

I can't share the exact benchmark I was using when testing this, but I think there's a good chance I can reproduce this with something similar that I can share, so I'll look into that.

jix commented 8 months ago

@mpreiner I managed to reproduce it: benchmark.smt2.gz. It performs bounded model checking on the https://github.com/chipsalliance/Cores-VeeR-EH1 RISC-V core and thus is a quite typical use case for yosys-smtbmc.

It has multiple incremental queries and I made the following plot comparing the time from the start to finishing each individual query for a few SMT solvers. I used a 5 minute total timeout.

plot

I tried --pp-variable-subst=false after a taking a quick profile showed that Bitwuzla spends a lot of time in functions related to variable substitutions. I didn't yet profile why it takes so long just before the end.

I also noticed that std::unordered_set and std::unordered_map related functions showed up in the profile. For a relatively large but not really hard to solve instance like this, I'd expect to see some hash tables in the profile, but given that std::unordered_set and std::unordered_map implementations are known for rather poor performance due to API constraints prescribed by the C++ standard, replacing them with faster implementations would most likely improve performance.

I quickly tried this and replaced NodeManager's d_unique_nodes with flat_hash_map and flat_hash_set from https://github.com/greg7mdp/parallel-hashmap (based on Google's Abseil library but with deterministic iteration order and without all the other things in Abseil) and that gave a small improvement. That is not included in the plot, as I did this afterwards and there are probably more places that would benefit from a faster hash table implementation. Also, while it often is a drop-in replacement, some use cases do require the additional guarantees the C++ standard makes, so I wouldn't recommend blindly replacing it like I did.

mpreiner commented 8 months ago

Thanks @jix! I'll have a look, the main culprit is preprocessing here, which works differently than what we had in the old version of Bitwuzla.

georgerennie commented 3 weeks ago

I had a quick play and the performance definitely seems to be getting lost in std::unordered_map/std::unordered_set, mainly in the preprocessing.

Also notably on my machine up to about a third of solving time gets spent in SolverContext::ensure_model inserting terms into a std::unordered_set. This is a real shame as that function will only even do anything if there are quantifiers in the formula which we (typically) never have for our hardware model checking instances.

Replacing the set with a contiguous std::vector<bool> like below makes the time spent in ensure_model negligible and reduces overall runtime from 78s to 44s.

diff --git a/src/solving_context.cpp b/src/solving_context.cpp
index 58587604..501feec8 100644
--- a/src/solving_context.cpp
+++ b/src/solving_context.cpp
@@ -300,7 +300,7 @@ SolvingContext::compute_formula_statistics(util::HistogramStatistic& stat)
 void
 SolvingContext::ensure_model()
 {
-  std::unordered_set<Node> cache;
+  std::vector<bool> cache;
   std::vector<Node> visit, terms;
   bool need_check = false;
   for (const Node& assertion : original_assertions())
@@ -310,7 +310,13 @@ SolvingContext::ensure_model()
     {
       Node cur = visit.back();
       visit.pop_back();
-      if (cache.insert(cur).second)
+
+      if (cur.id() >= cache.size())
+      {
+        cache.resize(cur.id() + 1);
+      }
+
+      if (cache[cur.id()])
       {
         if (cur.is_const())
         {
@@ -322,6 +328,7 @@ SolvingContext::ensure_model()
         }
         visit.insert(visit.end(), cur.begin(), cur.end());
       }
+      cache[cur.id()] = true;
     } while (!visit.empty());
   }

This particular patch may well not be a good idea in practice (if there are lots of node ids no longer in use due to rewriting etc this could require excessively more memory), but something along these lines may be sensible.

With this applied, on my machine 38% of time is still spent in methods of std::_HashTable, with 50% of time spent in preprocessing, a lot of that in the hash table methods.

mpreiner commented 3 weeks ago

@jix Sorry for dropping the ball. I totally forgot about this. @georgerennie thanks for bumping this thread. The issue here is not the data structure itself, but that ensure_model() is called way too often. This should only be called if the formula contains quantifiers, which it does not in the above case. I have a fix, which I need to test a bit more, which will bring the runtime down to 57s (instead of 105s) on my machine. It is still slower than the old Bitwuzla version (39s), but we are getting there.

georgerennie commented 3 weeks ago

Yeah for our use cases I think its perfectly fine to only do those checks when there are quantifiers, but maybe there are problems that do have quantifiers that would hit these same performance issues from the search. Thanks for looking at this!

mpreiner commented 3 weeks ago

Even with quantifiers we can do better than this. I'll think about it. In the meantime I pushed the fix to the main branch. In the above example what also saves some time is to disable the arithmetic normalization pass (--no-pp-normalize).

mattyoung101 commented 3 weeks ago

Hi, author of linked PR here, just thought I'd throw my results into the ring. I'm using a Ryzen 5950X, Arch Linux, 32 GB RAM. Results:

I may have invoked Bitwuzla stable incorrectly, or built it incorrectly somehow (I used the AUR package), as my 90 minute result doesn't align with the results I'm seeing in the rest of the thread. Nonetheless, 90 mins -> 42 seconds is really nice.

Forgive me as I'm not very knowledgeable about SAT or SMT solvers, but I'm just wondering why Yices is so much faster (especially before the fix @mpreiner checked in)? My understanding is that it's an older tool. Is this particular case just a challenging edge case for Bitwuzla? Is there anything Yosys could do to generate more optimal SMT2 documents?

mpreiner commented 3 weeks ago

Yices2 is a well-maintained state-of-the-art SMT solver, it's been around for quite a while now, but it doesn't mean it's "old and obsolete". Since Bitwuzla is a complete from-scratch rewrite, there are still a few performance regressions that we need to iron out. This is especially true for incremental problems. Although the 90min for Bitwuzla 0.5.0 don't look right, it was around 100s on my machine.

georgerennie commented 3 weeks ago

Given a lot of time is spent in variable substitution, I wonder if Yosys' encoding could be better on that front.

rg "\(= \|UNROLL#\d+\| #b[01]+\)" -c gives 21964 results in that file and looking at it these tend to be constants declared with define-fun then this equality asserted at the top level, so will trigger variable substitution.

Also this file does the unrolling in yosys-smtbmc, but there is also support for leaving that to the solver using UF where each variable is defined as a function from a state sort to its value, and then asserted over concrete instantiations of the state sort. I wonder if that would make much difference here or not (I'm not sure if rewriting can occur within function definitions separate from their instantiation or if it only occurs after?)

mpreiner commented 3 weeks ago

I think that we can improve the variable substitution pass, this is something I want to have a look next. We should be able to reduce the time in this preprocessing pass. The preprocessing pipeline is not yet fully optimized for incremental queries.