dj-on-github / SP800_90b_tests

A Python implementation of the non_iid tests in SP800-90B.
12 stars 3 forks source link

The "entropy per bit" value is misleading (for non-bit-oriented entropy sources) #13

Open joshuaehill opened 6 years ago

joshuaehill commented 6 years ago

The "Min Entropy per bit" value that the tool outputs is likely to be misinterpreted and abused.

The tool supplies an average value for the min entropy per bit (the per-symbol min entropy divided by the bits per symbol). It is likely that folks will make the assumption that entropy is uniformly distributed throughout the symbol (a wildly incorrect assumption for many sources!) and attempt to sub-divide the symbols and credit the proportional entropy for the sub-portion of the symbol that is being used.

dj-on-github commented 6 years ago

It reports both. I'm not sure why this matters - the entropy per bit is useful for getting normalized results for comparisons. From a certification perspective, you want to show that you are meeting the input requirements of the extractor and all the vetted extractors take multi-symbol inputs. So (number_of_input_symbols_to_ext entropy_per_symbol) == (number_of_input_bits_to_ext entropy per bit).

joshuaehill commented 6 years ago

That last equality statement is not true for many sources if you truncate the samples. It is true if complete samples are used.

For example, one common scheme is to sample a fast running counter (e.g., a TSC value), where the sampling occurs as a consequence of some event whose exact timing is difficult for an attacker to guess. If you look at how the min entropy is distributed in the samples from such a system, the low-order bits are often more difficult for an attacker to predict than the higher-order bits (the high order bits are often basically wholly known to any suitably informed attacker). Thus the low order bits tend to have more min entropy than the high order bits.

Providing a min entropy assessment in terms of a per-bit average suggests that one can freely subdivide a sample and credit each bit of the sample as containing the stated average. If one includes the entire sample, then (by definition) you get the total sample entropy, and the equality you state is clearly true. If you instead subdivide the sample, it's hard to comment about the entropy of the part that remains, and for systems where min entropy isn't uniformly distributed, it's very likely that the number of bits multiplied by the per-bit average min entropy won't be the correct value that should be credited.

I have witnessed this occurring "in the wild" on several instances, and the results are sometimes unfortunate.

dj-on-github commented 6 years ago

Ah. We wouldn’t touch that sort of source with a bargepole. How would tou use that for certification?

It’s also why the scope of data that the online test is important, so non stationarity doesn’t lead to over assumption of the extractor input. I still want to see the per bit entropy so I can compare analysis at different bit widths.

The output formatting is in the works - to try and match nists format and also provide a useful csv format similar to djent’s

So the per bit entropy numbers will be a function of what you ask for.

On Thu, Sep 20, 2018 at 2:24 PM Joshua E. Hill notifications@github.com wrote:

That last equality statement is not true for many sources if you truncate the samples. It is true if complete samples are used.

For example, one common scheme is to sample a fast running counter (e.g., a TSC value), where the sampling occurs as a consequence of some event whose exact timing is difficult for an attacker to guess. If you look at how the min entropy is distributed in the samples from such a system, the low-order bits are often more difficult for an attacker to predict than the higher-order bits (the high order bits are often basically wholly known to any suitably informed attacker). Thus the low order bits tend to have more min entropy than the high order bits.

Providing a min entropy assessment in terms of a per-bit average suggests that one can freely subdivide a sample and credit each bit of the sample as containing the stated average. If one includes the entire sample, then (by definition) you get the total sample entropy, and the equality you state is clearly true. If you instead subdivide the sample, it's hard to comment about the entropy of the part that remains, and for systems where min entropy isn't uniformly distributed, it's very likely that the number of bits multiplied by the per-bit average min entropy won't be the correct value that should be credited.

I have witnessed this occurring "in the wild" on several instances, and the results are sometimes unfortunate.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dj-on-github/SP800_90b_tests/issues/13#issuecomment-423338314, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDBGUlZPF__GfSfWzXesZHfUJHIvBf4ks5udAd8gaJpZM4Wydiu .

joshuaehill commented 6 years ago

You may want to wait to put a bunch of time into making the output look like the NIST's 2016 python implementation, as NIST plans on releasing a completely different C++ tool "real soon now". The last I heard (about a month ago), they had all the development done, and were performing testing.

dj-on-github commented 6 years ago

OK. I’ll focus on the csv. I’m travelling today. So I’ll be working on it sporadically.

On Thu, Sep 20, 2018 at 2:42 PM Joshua E. Hill notifications@github.com wrote:

You may want to wait to put a bunch of time into making the output look like the NIST's 2016 python implementation, as NIST plans on releasing a completely different C++ tool "real soon now". The last I heard (about a month ago), they had all the development done, and were performing testing.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dj-on-github/SP800_90b_tests/issues/13#issuecomment-423343243, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDBGbfvUlfve82Z32ux9U5D6HeY_Eo3ks5udAvHgaJpZM4Wydiu .

dj-on-github commented 6 years ago

CSV is in. Multi file isn't.

joshuaehill commented 6 years ago

NIST released their updated reference implementation today.

dj-on-github commented 6 years ago

Something to compare against. Yay.

On Fri, Sep 21, 2018 at 12:44 PM Joshua E. Hill notifications@github.com wrote:

NIST released their updated reference implementation today.

— You are receiving this because you commented.

Reply to this email directly, view it on GitHub https://github.com/dj-on-github/SP800_90b_tests/issues/13#issuecomment-423650756, or mute the thread https://github.com/notifications/unsubscribe-auth/AHDBGWA66yCt0OLWZ7IfffPVFZTNI6wVks5udUGjgaJpZM4Wydiu .

yuyinw commented 3 years ago

when i use cpu jitter collect 3840byte(30720bit), and i use this python tool , it out Minimum Min Entropy = 0.6581506573264674, so the final result is (30720 * 0.6581506573264674 = 20218 bit )?