bit_ready clock synchronization

DurandA commented 4 years ago

Hi again @dpiegdon!

In your _randomizedlsfr, I noticed that you you perform a new step on rising of bit_ready which is generated from the _binarydebias: https://github.com/dpiegdon/verilog-buildingblocks/blob/a5c0dc5d70020c17ef64906e39e28a8aee27b0ed/binary_debias.v#L28

I did not run the bench nor analyzed the signal on my FPGA, but I think this should give something along the lines of:

Is there any particular reason not to synchronize with the system clock directly (bit_ready has half the frequency of the clock)?

P.S. I'm building some cool stuff with oscillators using Migen. I still need to clean the repo before making it public.

dpiegdon commented 4 years ago

hi, your diagram looks about right.

the reason for the clk/2 is a statistical bias of the metastable signal, as generated by the metastable_oscillator_depth2. if you look at that signal using an oscilloscope, you will likely see that it tends to be a bit closer to 1 or to 0 on average (depending on the routing and on the implementation itself). but such a bias would degrade the entropy of the output stream.

here a super simple algorithm can be used to remove that bias: if you have an input bitstream i[0] .. i[n*2], create an output bitstream of o[0]..o[n]. o[x] := i[x*2] ^ !i[x*2+1]. i.e. always take two bits, negate one of them (swapping the bias toward the other side), and xor these bits together. by having two bits with the same bias with different sign, the bias is thus canceled out.

one could do this another way: for an input stream of i[0]..i[n+1], we could generate an output of o[0]..o[n] with o[x] := i[x] ^ !i[x+1]. that effectively uses clk/1 instead of clk/2. do you see why I chose the other method?

dpiegdon commented 4 years ago

btw, if you see any error here, let me know. also: what kind of tool did you use to make the above diagram? looks fancy.

DurandA commented 4 years ago

Thanks for taking the time to detail your reasoning, it makes more sense to me now.

one could do this another way: for an input stream of i[0]..i[n+1], we could generate an output of o[0]..o[n] with o[x] := i[x] ^ !i[x+1]. that effectively uses clk/1 instead of clk/2. do you see why I chose the other method?

This would change the bias towards more/less chance to flip the next bit.

On the other hand, I wonder if such a binary debias is good enough for high quality numbers generation. If there is a bias, there is a lack of entropy on individual bits and this method effectively takes a larger input to compensate for that. I guess I should do my homework and read some papers on the subject. I would expect something like feeding a digest for cryptographic applications. Was your design inspired from a particular document?

For the diagram I used WaveDrom which is awesome.

Thanks again!

dpiegdon commented 4 years ago

This would change the bias towards more/less chance to flip the next bit.

I am not exactly sure what you mean with this. However, here is the reason why I think this is better: If you look at each bit by itself, both methods are equally good. But if you have a stream of bits, and know that each bit actually depends on two bits, and say you know bit0 and bit2, you can make some assumptions on bit1 given the bias. Ideally you want statistical independence of each bit. using this method thus leads to some statistical dependence.

dpiegdon commented 4 years ago

On the other hand, I wonder if such a binary debias is good enough for high quality numbers generation. If there is a bias, there is a lack of entropy on individual bits and this method effectively takes a larger input to compensate for that.

i think that is not neccessarily true. a statistical bias does not neccessarily come hand-in-hand with less entropy. here we only have the problem that we are in a discrete space. we cannot resolve the bias by means of mapping a debias function to each sample (because we only have 0 and 1), so we have to use more data to debias it. If you think of it, a random bit in this case can be represented by a probability function of an output voltage of the metastable circuit. if the output is below a certain threshold, it is 0, if it is above, it is 1. (logic level of the signal.) for this argument, assume the threshold is perfect (i.e. there is no undefined area in between that would lead to metastable states) and centered (which it does not have to be, which incidentally would lead to another bias, which also would be canceled out i think). for ease of argument, also assume the output is normalized, thus it always is in [0, 1] and 0.5 is the threshold.

if this output has a bias, then the integral [0 .. 0.5] is NOT equal to the integral [0.5 .. 1].

if you apply the binary debias algorithm, you fold two bits together by taking their probability functions, flipping one of them around, adding them together and then dividing them by two. if you do that, you always get a new symmetric probability function. thus the bias is perfectly cancelled. of course, that only holds under the given assumptions, and that each bit is statistically independent of each other, and that the probability function does not change across samples.

theoretically you could fix the issue in the analog space by instead selecting a threshold T such that the integral [0 .. T] equals the integral [T .. 1] equals 0.5 . then you'd get a random stream with perfect entropy. (assuming the input has perfect entropy besides the bias). but of course we cannot do this here, because the threshold is fixed by the hardware.

i hope this makes clear what i was trying to do. i needed a bit time and formalism to figure that, but your argument got me curious and i wanted to see where it takes me.

also, i agree: if crypto-grade entropy is needed, the output of the randomized_lfsr might not be sufficient. i have done what i can to maximize and verify the entropy, but in the end, this is just a hobby project. selecting a cryptographic hash algorithm instead of an LFSR might improve the quality further.

dpiegdon commented 4 years ago

as for reading advice... for such low-level detailed stuff, i'd recomment the famous applied cryptography (bruce schneier). also wikipedia on entropy and RNGs, and then about a year ago i have been reading some of the papers that are linked in the README of the ringoscillator repository.

dpiegdon commented 4 years ago

after years i just picked up my copy of applied crypto and am reading on real random-sequence generators (17.14), biases and correlations. he does not propose the algorithm i used, but some other similar methods. now i am even more curious. i am going to look into this a bit more.

dpiegdon commented 4 years ago

you were right! the binary debias in not a perfect solution. the crux is in this sentence:

if you apply the binary debias algorithm, you fold two bits together by taking their probability functions, flipping one of them around, adding them together and then dividing them by two.

this is not really what is happening here. this folding of probability functions can only be done in the pre-sampling; once the signal has been sampled, there is no way to do this any more. i think there actually is no way to remove the bias in such a predictable way. it would have been easy, but i skipped calculating the probabilities of the final bit state when i implemented it. given a probability p that a sampled bit is 1, if one uses binary debias with two input bits, the resulting probabilities is p_debias = 2p - 2p^2. probability for the other state is 1 - (2p-2p^2). that is only 0.5 if p is 0.5, though the function converges toward 0.5.

so what can we do? both the applied crypto and other sources suggest the von-neumann method [1]. the first link gives a good description:

Take two independent 1-bit samples. If the bits are the same, discard them and take two more. If they are different, use 1 of the bits as the output. This method has the nice property that it's theoretically perfect - regardless of how biased the original source is, the output is exactly as likely to be a 1 as a 0. But it has the unnice property that it might never produce any bits. So, there's method

so here the number of input bits only gives a rough estimate of the number of output bits.

further improvements would be:

hide the internal state of the feedback register. leaking the full state makes it more guessable. only a few bits of each full word should be used to avoid that.
use a cryptographically sound hashing function instead of a simple LFSR. e.g. something like md5 (lower end) or sha512 (higher end).

I'll think about implementing these things. for now i'll adjust the documentation of these modules.

[1] von Neumann Algorithm for perfect debiasing http://www.helsbreth.org/random/removing_bias.html https://mcnp.lanl.gov/pdf_files/nbs_vonneumann.pdf

dpiegdon commented 4 years ago

Thanks for making me think more about this! :-)

dpiegdon commented 4 years ago

(another great reading material is the "handbook of applied cryptography" [not the same as "applied cryptography], which is available online at http://cacr.uwaterloo.ca/hac/ . chapter 5 http://cacr.uwaterloo.ca/hac/about/chap5.pdf is all about Pseudorandom Bits and Sequences.)

DurandA commented 4 years ago

Hi @dpiegdon, thank you so much for sharing your research. I apologize for taking so much time to answer. I also wanted to share something of interest but was drown by other projects.

I am working on physical unclonable functions (PUFs) using ring oscillator on ECP5. This started as a hobby project this summer and the progress was slow since I am new to FPGAs and I had to familiarize with the open source toolchain, Migen and LiteX. I currently implemented a conventional ring oscillator PUF, a Transient Effect Ring Oscillator (TERO) PUF and something similar to the Dopingless Transistor Based Power Optimized HybridOscillator Arbiter PUF but on FPGA—largely untested and not sure if it will work. I am now working on manual placement of the oscillators on ECP5 slices. This is some guesswork as I don't know how things are placed internally besides the floorplan. I am not even sure that the ECP5 is suitable for that as the steadiness is too high and uniqueness too low on placements I tried. When the project will be somehow usable, I will publish it on GitHub and will let you know. I also tried to make a random generator module as you did in Migen but did not complete it yet.

I bought Applied Cryptography and just started reading it. I enjoy it so far, thanks for the recommendation. My only complain is that the updated edition lacks some modern crypto like elliptic-curve cryptography.

DurandA commented 3 years ago

Hi @dpiegdon. I think you will be interested in reading this Stack Exchange answer about the use of LFSR for ring oscillators. TL;DR: debiasing is not required but some strong decimation at the LSFR stage should be performed.

Also be careful if you are using Applied Cryptography as a reference for LSFR as the placement of taps is wrong. https://github.com/dpiegdon/verilog-buildingblocks/blob/c17ec0af7791f4c1850a817f421fedcba0f42eca/lfsr.v#L28 Since your LFSR is right shifting, I believe you used the polynomial x^16 + x^5 + x^3 + x^2 + 1 which is maximal-length and probably what you intended.

I hope that you will find the links interesting. :smile: Feel free to close this issue.

dpiegdon commented 3 years ago

Hi @DurandA thanks for the links! I believe I chose the taps according to the wikipedia page on the fibonacci LFSR. The only difference being that the bit-numbering is in opposite order. But you are right, an LFSR, even with a good random source, is not adequate for cryptographic use. But it can still be very helpful when needing (plenty) entropy for most other uses.

dpiegdon / verilog-buildingblocks

bit_ready clock synchronization #4