anoma / geb

A Categorical View of Computation
https://anoma.github.io/geb/
GNU General Public License v3.0
28 stars 10 forks source link

Add method to compile geb to boolean circuits. #105

Closed AHartNtkn closed 1 year ago

AHartNtkn commented 1 year ago

Adds category of boolean circuits (BITC). The objects in this category are just natural numbers representing the type of bit vectors of a certain length. There are nine methods for constructing morphisms;

There are other ways to formulate this category, but I think this particular formulation is quite convenient.

I've implemented a to-bitc function in geb.trans which translates geb objects and morphisms into bitc objects and morphisms. Additionally, I've implemented a to-vampir function in bitc.trans which translates a bit morphism into a vampire morphism.

I'm not sure what else is needed, but the core implementation is done for now. In the future, this should be extended to a category whose objects represent vectors over some finite set other than the booleans. The reason I didn't do that hear is because coproducts are only binary and there aren't finite sets beyond so0 and so1, so bitvectors are quite natural and using anything else would require post-hoc optimization, but future versions of geb may want more. Also, I'd like to know what, if any, performance benefits this gives over the univariate polynomial formulation. I didn't test that.

rokopt commented 1 year ago

Wow, at first glance this looks brilliant -- a clear model of exactly what VampIR circuits can do is something I've always wanted in Geb (but I don't know VampIR well enough to write such a model).

Without having looked at the code yet, I have one question relating to #89 -- have you thought about what an extension to the BITC which includes VampIR's higher-order functions might look like? Is that even the right place for me to be thinking about where to plug in that support?

rokopt commented 1 year ago

Oh, and a similar question to my previous one: would it (as with my previous question, at some point in the future after this PR is in, of course, not adding more work to this PR) make sense to extend the BITC with explicit use of VampIR's constraints? Or is there a different layer at which that should be done, or perhaps does it make sense for layers above VampIR, such as Geb, simply to express all constraints implicitly through computation?

rokopt commented 1 year ago

And to test my understanding: is it the case that the BITC is effectively abstracting the underlying multivariate VampIR compilation so that layers above (such as Geb's compilation of SubstMorphs to the BITC) only have to know about compiling to 2^n bits and don't have to know anything about whether the compilation from BITC to VampIR is going to a univariate or to a multivariate polynomial (and if multivariate, how many variables it's using)?

AHartNtkn commented 1 year ago

On higher-order functions in VampIR: VampIR does not arithmetize higher-order functions. A fully normalized VampIR program will have no higher-order functions; it's just a system of polynomial equations. If you want the execution of the higher-order functions to be verified, it would have to be above this level, and it wouldn't necessarily be reflected in VampIR's higher-order functions. lukaszcz's comment is correct.

On constraints: Yes, I think it makes sense to add constraints here (or to geb). It may require thinking about non-functional categories/allegories/categories of relations. I'm not sure if this is necessary; the only constraint VampIR has is equality checking between field elements. Other kinds of constraints (e.g. computing if one tree is smaller than another) would require their own computation anyway, so I'm not sure it's completely necessary, but it's worth thinking about.

On "test my understanding": The bits aren't necessarily a power of 2, but yes. The GEB code does not need to encode anything about how many variables the final polynomial has.

rokopt commented 1 year ago

Thanks for the replies. I have some follow-up questions:

On higher-order functions in VampIR: VampIR does not arithmetize higher-order functions. A fully normalized VampIR program will have no higher-order functions; it's just a system of polynomial equations. If you want the execution of the higher-order functions to be verified, it would have to be above this level, and it wouldn't necessarily be reflected in VampIR's higher-order functions. lukaszcz's comment is correct.

Are you referring to this comment?

On constraints: Yes, I think it makes sense to add constraints here (or to geb). It may require thinking about non-functional categories/allegories/categories of relations. I'm not sure if this is necessary; the only constraint VampIR has is equality checking between field elements. Other kinds of constraints (e.g. computing if one tree is smaller than another) would require their own computation anyway, so I'm not sure it's completely necessary, but it's worth thinking about.

In light of "the only constraint VampIR has is equality checking between field elements": does that mean that any VampIR constraint can be precisely modeled as a category-theoretic equalizer?

On "test my understanding": The bits aren't necessarily a power of 2, but yes. The GEB code does not need to encode anything about how many variables the final polynomial has.

Oops, right, sorry, I shouldn't have said 2^n bits, but rather Fin(2^n), i.e. n bits.

rokopt commented 1 year ago

I forgot one more potential-BITC-extension-related question: once we have an initial port of #101 to Lisp to address #61 , which I think will at first reuse the same code that we're currently using in Geb to compile +, *, -, /, %, and < to VampIR, would any of the following make sense from both semantic and performance perspectives?

1) Extend the BITC category to include those explicitly-numeric operations, and add to it compilation procedures to potentially-multivariate polynomials for each of them 2) Update the Geb code which compiles its internal numeric operations to compile instead to the existing BITC, meaning that we implement addition, multiplication, and so on ourselves in terms of the existing BITC, using standard algorithms for fixed-width numeric operations, as you've done for the existing BITC operations to implement Geb's substmorph compilation, thus automatically obtaining compilation to potentially-multivariate polynomials for the built-in natural numbers 3) Write some other category modeling numeric operations and compile it to BITC 4) Write some other category modeling numeric operations and compile it directly to VampIR operations 5) Some other change 6) No change; just leave the code as it is in the initial natural-number implementation, compiling numeric operations using the code that we already have which implements substobj in terms of numeric operations

rokopt commented 1 year ago

To me the net code changes look great; would you please squash (and then, if there's a separation that would improve clarity, break up again) the commits in preparation for merging, to get rid of the merge commits and bug fixes to new code (if there are any bug fixes to pre-existing code, then those should be separate commits, but I haven't spotted any)?

rokopt commented 1 year ago

To me the net code changes look great; would you please squash (and then, if there's a separation that would improve clarity, break up again) the commits in preparation for merging, to get rid of the merge commits and bug fixes to new code (if there are any bug fixes to pre-existing code, then those should be separate commits, but I haven't spotted any)?

Actually, @AHartNtkn , you don't need to worry about this; @mariari offered to take care of it (thanks!).

AHartNtkn commented 1 year ago

Are you referring to this comment?

Yes

In light of "the only [...]

I don't know what you have in mind when you say "modeled as a category-theoretic equalizer". An equalizer in GEB would just be a finite set, and wouldn't produce any constraints when compiled into VampIR. Additionally, equalities between constants would presumably produce somewhat trivial equalizers. 1 + 1 = 2 would be, what, an equalizer between the constant functions from the initial object into 1+1 and 2? Which would just be the initial object again; it wouldn't produce that actual check. You'll have to explain to me what you're imagining.

once we have an initial port of #101 to Lisp to address #61 [...]

I don't really understand what that implementation of natural numbers is actually doing. Is it just creating a type that represents integers mod n, plus the existing arithmetic on it? If I have that right, it does make sense to modify BITC into a new category to support these. This category would have lists of numbers as objects (representing vectors of finite sets, with the number representing the size of the set in that slot). Most of the existing operations would be the same, but there would be constants for any particular number, and branch would have to be adaptable to splitting on larger finite sets.

The arithmetic operations that GEB uses could then be incorporated as bespoke morphisms added manually to the category. + would go from [n, n] to [n]. < would go from [n, m] to 2, etc. Performance-wise, it will be just as efficient as current GEB. The benefits come from not having to use them to perform non-arithmetic data structure manipulations.

If we implemented these operations in BITC (which we could do through an encoding of bitstrings as a recursive type), that would likely be similar in efficiency to using custom operations specifically when range checks are necessary. That is, for % and <. It would be WAY, WAY less efficient for operations that don't require range checks, being +, , /, and -. Then again, if you're modding everything anyway, that would likely negate any performance benefits. If you want to do a lot of mixed-mods, then it might actually be better to encode everything. But for ZK applications, you'd want to fix a single mod, then most things become much more efficient. If you do that, then I suppose the vectors would represent vectors of sets of size p, where p is the size of the field of your arithmetic circuit. Allowing modding for anything else would require range checks everywhere, so you might as well encode them. This is a bit of a design question; it depends on what you expect to happen. I suppose there could be both; efficient, special +, , /, and - morphisms for p, and less efficient ones for every other mod. I suppose that would be my recommendation; bespoke +, *, /, and - for the field size, and implementing everything else as an encoded morphism.

mariari commented 1 year ago

I can't seem to force push to the branch so I've made a new branch with the changes, please review that for the actual code

mariari commented 1 year ago

Judging the big O of to-vampir I think using lists is fine, at first I wasn't sure about subseq, but since the size changes quite often, lists are fine for this

mariari commented 1 year ago

See 108

rokopt commented 1 year ago

In light of "the only [...]

I don't know what you have in mind when you say "modeled as a category-theoretic equalizer". An equalizer in GEB would just be a finite set, and wouldn't produce any constraints when compiled into VampIR. Additionally, equalities between constants would presumably produce somewhat trivial equalizers. 1 + 1 = 2 would be, what, an equalizer between the constant functions from the initial object into 1+1 and 2? Which would just be the initial object again; it wouldn't produce that actual check. You'll have to explain to me what you're imagining.

Sorry, that was a very under-specified question. Here's what I mean.

You've given us the bitc category to model within Geb what a boolean circuit can express (and we've discussed some possible extensions in earlier comments). So what the STLC-to-Subst-to-poly part of Geb will end up producing when given an STLC program is a morphism of bitc, which has a domain consisting of natural numbers m bits wide and a codomain consisting of natural numbers n bits wide, for some natural numbers m and n.

When we add constraints to a circuit (in particular, for example, if we were to add some representation of constraints to bitc or some extension of it), my understanding is that the effect is that proof verification will fail for some subset of the possible inputs to the program (where the input in this case has the form of some m-bit-wide natural number).

If that much is true, then this is my question:

For any possible set of constraints that we could add to the circuit, is it the case that we could find some natural number k and some pair of morphisms f, g of bitc from m-bit vectors to k-bit vectors such that the subset of inputs for which the constraints would all be satisfied would be precisely the equalizer of f and g?

rokopt commented 1 year ago

I don't really understand what that implementation of natural numbers is actually doing. Is it just creating a type that represents integers mod n, plus the existing arithmetic on it?

Yes, that's right.

rokopt commented 1 year ago

If you want to do a lot of mixed-mods, then it might actually be better to encode everything. But for ZK applications, you'd want to fix a single mod, then most things become much more efficient.

This is interesting -- I was just about to ask how the finite-field aspect of circuits works. Does this mean that boolean circuits can implement a client-chosen fixed global modulus for the entire circuit, and reasonably efficiently? (Whereas using different moduli in different places would require the client to write manual mod operations anywhere they wanted a modulus that wasn't the globally specified one?)

AHartNtkn commented 1 year ago

For any possible set of constraints...

Yes, I do believe that would be the case.

[...] boolean circuits can implement a client-chosen fixed global modulus [...]

No, that's not what I'm saying. Rather, arithmetic circuits come equipped with an intrinsic modulus chosen by the client, and that one modulus can be done efficiently, but any other can be done about as efficiently as possible using an encoded representation, which isn't very efficient. BITC, as it stands, can't implement any mod efficiently (relative to the intrinsic mod of the circuit), but if we wanted to compile to what arithmetic circuits can do efficiently, that would require additional morphisms which are not encoded; but they could only exist for one mod. For other mods, we might as well just encode them instead of having them available as atomic morphisms; any additional efficiency would be rather meager as we would need to decompose everything into bits anyway for range checks.

rokopt commented 1 year ago

For any possible set of constraints...

Yes, I do believe that would be the case.

[...] boolean circuits can implement a client-chosen fixed global modulus [...]

No, that's not what I'm saying. Rather, arithmetic circuits come equipped with an intrinsic modulus chosen by the client, and that one modulus can be done efficiently, but any other can be done about as efficiently as possible using an encoded representation, which isn't very efficient. BITC, as it stands, can't implement any mod efficiently (relative to the intrinsic mod of the circuit), but if we wanted to compile to what arithmetic circuits can do efficiently, that would require additional morphisms which are not encoded; but they could only exist for one mod. For other mods, we might as well just encode them instead of having them available as atomic morphisms; any additional efficiency would be rather meager as we would need to decompose everything into bits anyway for range checks.

Aha, I see. So I think I gather that providing a modulus and parameterizing bitc on it would be another possible future extension to the category. Thank you!