Sources of entropy and PRNG

helins commented 3 years ago

Many uses cases require for a process to be somewhat stochastic without requiring a particularly good source of entropy in the cryptographic sense of the word.

This issue serves to investigate possible sources (aka seeds) and simple schemes for providing a rand function typically found in programming languages. Constraint is that it has to remain fully deterministic on chain, just like a typical PRNG function is indeed deterministic (same output is produced for the same input seed). It follows that the seed must come from some information on chain.

Most likely, the most straightforward and "random" source of entropy given such constraints is the hash of the encoding of the current network state, which could be described as such:

(defn rand

  ([]
   (rand (long (hash (encoding *state*)))))

  ([seed]
   (abs (/ seed
           9223372036854775807))))  ;; Max long (signed int64)

Given that current stack is theoretically capable of handling +/- 100k transaction per second, seed is effectively random "enough" even without envisioning further performance improvements:

High volume of transactions, load varies and isn't strictly constant
Transactions in a block are ordered by a peer, hence somewhat random unless peer is controlled
Blocks of transactions are ordered by the consensus algorithm, blocks are emitted concurrently, very hard to predict any outcome upfront
Even if previous factors were predictable enough, network delays add uncertainty

Those factors imply that predicting the seed at a precise moment is practically impossible assuming the network is being utilized. Worst case scenario would be absolutely no transaction going on, meaning that queries (which do not imply any state change nor consensus) would effectively return a constant value.

Limitations. It must be understood that seed varies only when state is changed. Hence [(rand) (rand)] would return 2 identical values.

Pending. Must be outlined the exact computational cost of (encoding *state*). Encoding of a cell is cached. A transaction that induced a state change prior to calling rand as defined here would require recomputing the state encoding due to a cache miss. Under normal operations, that would blow up juice consumption. (Please correct this statement or provide more details).

mikera commented 3 years ago

My current preference is to do something that avoids expensive operations. Computing a hash of the whole state is prohibitively expensive if performed multiple times within a block, since it will involve encoding and hashing a lot of intermediate tree nodes.

Probably we can use something like XORShift64 to get a "good enough" PRNG mixing in some data like the current juice, transaction hash and initial state hash to get a PRNG cheaply enough.

helins commented 3 years ago

Is *state* encoding only recomputed when it must be persisted in Etch?

mikera commented 3 years ago

Ideally yes. We want to avoid getting the encoding (or worse, the hash...) during normal operations. It might even be a DoS vector if we're not careful. This is actually an argument for prohibiting access to the full *state* directly from CVM code

helins commented 3 years ago

Juice accounting prevents DoS attacks, doesn't?

And peers are not mandated to persist *state* in-between transactions, blocks, or at any specific moment. So state hash cannot be a reliable seed since peers do not have a common notion of "last known state hash", can it?

helins commented 3 years ago

In any case it stands to reason that the std lib should make a distinction between seed (opiniated function producing a seed given some on-chain parameters) and rand (generic CVM function computing pseudo-random numbers given a seed, like Xorshift)

mikera commented 3 years ago

Juice accounting prevents DoS if and only if we can calculate juice cost to be roughly proportionate to the actual computation time (at least in the same order of magnitude / complexity). Hashing the state has a high and difficult to calculate cost so tricky to do with juice accounting alone (you would incur a lot of the cost even just calculating the order of magnitude....).

An opinionated seed and PRNG rand is still a good idea, just need to figure out an efficient way to do it that doesn't compromise security too much. It would be a problem if someone could sample the PRNG and predict easily future values within CVM code, for example (imagine a caller selectively deciding whether or not to place a bet on a roulette smart contract...)

helins commented 3 years ago

It is acceptable for a set of CVM parameters to provide a weak source of entropy, as long as it is extensively documented. Users should be able to complement that source with their own sources, adding complexity if their use cases do require more guarantees:

PRN computed on user's machine (but considered too weak on its own and/or not trustworthy by other users)
Trusted oracle streaming random numbers

@superstructor Given you gave that kind of domain some thoughts, would you have any recommandations?

Convex-Dev / convex

Sources of entropy and PRNG #329