Open chriseth opened 7 years ago
If you use simple activation functions (e.g., ReLU or leaky ReLU) then the forward pass is nothing but simple additions and multiplications. Some activation functions (e.g., tanh) might be harder to verify with zkSNARKs, but I don't know enough about zkSNARKs to know what's possible and what isn't. I have personally never used that "bent identity" activation function. ReLU is good enough in most cases in my experience, and that's simply max(0, x) so I assume that's something zkSNARKs can verify.
Keep in mind that many NN architectures contain well over 1 billion parameters, so that's many many operations. There are NN profilers (e.g., this one) that will estimate # of parameters and # of operations. In your case, you don't really care about the topology (e.g., fully connected vs convnet), but more about the total number of ops, if I understand correctly.
Overall it seems technically possible. My main concern is that the canonical use case isn't clear to me. Who are the actors? There's the person who train the NN (the trainer), and the person who wants to use it to make a prediction, correct? The trainer wants the trained NN to be usable by the public, without disclosing the parameters to the public? In a centralized setup, the trainer could expose a private API that takes the input tensor, run the NN computation and return the output. In a decentralized setup, the trainer could make the trained NN available publicly (e.g., in IPFS) but that means the parameters are now public and I can "cheat" the NN by finding the input that will maximize the output. Who would I be "cheating"? The trainer? Myself? I feel like I might be missing an actor.
I think this is an interesting question, I'd be interested in hearing about concrete use cases you have in mind.
Only rational functions can be verified by zkSNARKs efficiently. A single multiplication takes a single gate and you have nothing else apart from multiplication and addition, not even comparison or bit access. You can simulate comparison and bit access by "guessing" the bit composition of the input, reconstructing and requiring equality (note that this is different from comparison, where you get a boolean and can work with it while you force the boolean to be true in the other case). The max(0, x)
function will probably take around 200 gates. It will probably still be most important to keep the number of neurons down, since 1 billion is too much, I guess, though I think that zcash uses a circuit in the range of 1 million gates (but I might be wrong).
Concerning use-cases: That's also not yet fully clear to me. The idea is that you get a trustless evaluation of neural networks for smart contracts to use and rely upon. Say we have a smart contract that gives you money if you can come up with a new and "interesting" picture. The neural network checks that it is "interesting" and computes a fingerprint of the picture to check against pictures that have already been submitted to the smart contract. The way to cheat here is to create a picture that is just an already existing picture plus noise that cannot be detected by the neural network but results in a new fingerprint. Of course the question here would be how to avoid that the trainer cheats, but perhaps it is just the trainer's money inside the smart contract.
I see, so the ReLU activation function is actually not easily verifiable with zkSNARKs since it would require ~200 gates. Since you mentioned the bent identity function would only require 1 gate, I benchmarked it against the ReLU activation function. On the MNIST dataset with a convnet architecture, both ReLU and bent identity lead to similar convergence rate and testing accuracy. See the details of the architecture here
ReLU Iter 197120, Minibatch Loss= 33.650322, Training Accuracy= 0.98438 Iter 198400, Minibatch Loss= 255.560089, Training Accuracy= 0.96094 Iter 199680, Minibatch Loss= 0.000000, Training Accuracy= 1.00000 Optimization Finished! Testing Accuracy: 0.972656
Bent identity
Iter 197120, Minibatch Loss= 265.025818, Training Accuracy= 0.99219 Iter 198400, Minibatch Loss= 397.303589, Training Accuracy= 0.98438 Iter 199680, Minibatch Loss= 868.918640, Training Accuracy= 0.96875 Optimization Finished! Testing Accuracy: 0.988281
Replacing ReLU with bent identity might not work in all cases but it seems to work just fine here. The two functions have a similar behavior around 0 which might explain the results, but I'm just guessing.
I think the right approach is to assume that a subset NN functions can be verifiable with zkSNARKs in an efficient manner and to first clarify the use cases. I like the example you gave above, maybe it can be extended to any task where discrimination is easy but generation hard. For instance, I might want to build a decentralized crowdsourcing platform for text-to-speech. A user can upload a sentence 's' to be spoken, and another user can speak that sentence and upload the recorded audio 'a' to the platform. The speaker only receives a bounty if the smart contract accepts the audio as real. To do this, the smart contract uses a public function d(a | s) that either returns real or fake. That function is a NN that was trained to discriminate between real human speech and generated TTS output. As far as I understand, your intuition is that if the function d is public and all parameters are available, someone could reverse engineer the optimal synthetic input 'a' so that the function will output 'real' for a given sentence. Maybe then this could be automated and I could claim all bounties without ever having to do any work.
Is that the class of problems you are hoping to solve with this? As a side note, I think you might want to abstract away from neural network specifically since there's no reason someone should be restricted to them to build their discriminative model. A broader research question might be "zkSNARKs for trustless machine learning". Starting with NN would still make sense since they are well-known and often state-of-the-art.
If I understand correctly, this verifies that the forward-pass / prediction was done correctly. However, if the weights are not out in the open, aren't we introducing trust into the actor publishing the weights / training the network?
Have you given this any thought @chriseth?
It will include a hash of the weights as a public input, so you can be sure that multiple applications use the same weights. You can also publish the weights and then verify the hash, but that does not have to be done on the blockchain.
If course you have to put trust into someone training the network correctly (if there is such a thing as a correctly trained network). You can do some random samples to check the network, though.
It might be possible to use zkSNARKs for verifying computations of neural networks quite efficiently. The reason is that a single multiplication gate might suffice to model a neuron if the activation function of the neuron is a rational function. This means that the number of gates will be quite small. We still have the problem that the number of wires is extremely large. The scheme by Groth (On the Size of Pairing-based Non-interactive Arguments) trades complexity in the number of multiplication gates for the number of wires, so that might be feasible (and the Groth scheme can be implemented with the planned precompiles).
If the weights of a neural network are public, it is quite easy to fool it (you analyze the network like you do in backpropagation). On the other hand, there are also use-cases where the weights are private. In this use-case, there would be a fixed SNARK for a universal neural network of a fixed size and topology (all of them sharing the trusted setup). The weights would be part of the private input. If you do not do anything else, the prover can prove anything about the input. Because of that, the universal neural network also has a component that computes a hash of the weights which is part of the input. That way, the input selects a neural network by the hash of its weights. The prover can now evaluate the neural network and create a zkSNARK showing that it computes a certain result.
There are not too many rational activation functions and in the general circuit model of zkSNARKs, a polynomial would not be a good fit as e.g.
y = x^3
does not have a large slope atx=0
. Here, we have to think of zkSNARKs less in the circuit model but rather in the "set of polynomial equations" model: The functiony^3 = x
is already quite close to the arc targent or some other step function and can be realized with only two constraints or "gates". Another activation function is the "bent identity":y = (sqrt(x^2 + 1) - 1) / 2 + x
- this one can be realized with just a single gate.