crypt.5 .hash's reporting of effective hash size is wrong

besser82 / libxcrypt

Extended crypt library for descrypt, md5crypt, bcrypt, and others

GNU Lesser General Public License v2.1

194 stars 53 forks source link

crypt.5 .hash's reporting of effective hash size is wrong #29

Closed solardiz closed 2 weeks ago

solardiz commented 6 years ago

As discussed in https://github.com/besser82/libxcrypt/pull/27#discussion_r213801730, the .hash macro has been changed somewhere between my revision in Owl (and its later life in ALT Linux?) and libxcrypt's to change the reporting for traditional DES-based hashes from:

       Effective key size
              up to 56 bits

       Hash size
              64 bits

to:

       Hash size
              64 bits (effectively 56)

(There are other differences as well, but those look OK.)

The above change is wrong because a limit on key size doesn't reduce the effective hash size as claimed.

First, a cryptographic 56-bit key to 56-bit hash function would likely produce far more collisions (over its full 56-bit key space) than a cryptographic 56-bit to 64-bit hash function is expected to. If this isn't immediately obvious, please consider extreme cases such as a 1-bit to 1-bit vs. a 1-bit to 64-bit hash function (in the former case, 2 out of 4 possible functions would produce collisions half of the time, whereas in the latter an extremely low percentage of the possible functions would produce collisions), or an 8-bit to 8-bit (expect only ~162 different outputs over the 256 input keys space) vs. an 8-bit to 64-bit hash function (expect no collisions).

Second, even with the key input limited to 56 bits, almost the full 2^64 variety of 64-bit hash outputs are likely reachable via varying key+salt inputs.

The same issue is probably also present for BSDI-style DES-based hashes, where the password input length is not limited to 8 characters, but internally the key size is effectively limited to 56 bits, creating a "narrow pipe" within the hash function (except for the salt input, which is also treated separately). This "narrow pipe" is similarly not the same as an effectively reduced hash output size.

Perhaps this change to .hash should be reverted or revised, and some uses of .hash may need to be revised accordingly. I understand that the lines:

       Effective key size
              limited by the hash size only

as would be reported by my original .hash for most hash types didn't look pretty. Maybe we can have the macro omit this reporting for hashes where this is the case, only leaving the reporting for the few special-case hashes?

zackw commented 6 years ago

Check my logic here: Traditional DES recognizes only 2⁵⁶ distinct input passwords. (This is captured by the "maximum password length" entry in the table.) Those passwords are translated directly to DES cipher keys. The input block is fixed. It follows that, for any one salt, traditional DES crypt has only 2⁵⁶ possible outputs, even though its output is 64 bits long. If there is an output collision, that implies a weakness in the DES block cipher primitive (two keys produce the same output for the same input). The practical consequence of this is that a complete rainbow table for traditional DES requires 2⁵⁶ × 2¹² entries.

BSDI enlarges the salt, but uses an iterated Merkle-Dåmgard-ish construction to map unlimited-length passphrases to the 2⁵⁶ possible DES keys, which means that for each passphrase longer than 8 characters, there is a passphrase of no more than 8 characters that collides with it (although the short side of the collision has a good chance of being untypeable). A complete BSDI rainbow table would have 2⁵⁶ × 2²⁴ × 2²³ entries (the additional factor comes from the variable round count).

bigcrypt does somewhat the opposite: the salt stays 12 bits long, but the output length is enlarged proportional to the length of the input, using a construction reminiscent of CBC mode (each block of hash provides the salt for the next eight characters of password). This means brute-force guessing is parallelizable (each block can be cracked independently) and it also means a complete rainbow table for bigcrypt needs only 2⁵⁶ × 2¹² entries even though the output could contain many more than 56 significant bits. You would simply look up each block independently.

zackw commented 6 years ago

Gah, accidentally hit the close button while trying to edit.

Now, the question in my mind is, what are we trying to document here? Terminology aside, I think it is simply that, for the DES variants and only the DES variants, the output is longer than required for the number of possible outputs, and therefore brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file.

With me so far?

solardiz commented 6 years ago

While "what are we trying to document here?" is an important question, whatever we document better be correct - and right now it is not, hence this issue.

I disagree that "for the DES variants [...] the output is longer than required for the number of possible outputs". They would have been somewhat worse if their output size were 56 rather than 64 bits. First, given a fixed salt, a 56-to-56 hash would produce only ~63.2% (or 1-1/e) of the 2^56 different values, whereas a 56-to-64 hash would produce ~99.8% of them. Second, with salts the input space is 56+12 bits, which is larger than 64.

Now, I agree the salts helping utilize more of the output space, and occasional hash value collisions, are both mostly irrelevant to many realistic password cracking attacks, such as with tables. (BTW, I'd rather not go into discussing rainbow tables here - they're complicated "compressing" data structures (where the input number of "entries" is typically orders of magnitude higher than what's recorded), and they're also probabilistic. For the purpose of this discussion and to have number of "entries" more precisely defined, I'll assume you meant simpler lookup tables.)

I agree that for attacks the input key space matters more than the output hash size does. (And distribution of actual passwords over that space matters even more.)

However, we should and we do have separate fields for these, and we need to put the correct information in them. We shouldn't confuse input key size or its pipe width with output hash size. That's an artificial over-simplification, and it's incorrect.

solardiz commented 6 years ago

Regarding "brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file", only someone unfamiliar with password cracking would try to draw conclusions about those things from hash sizes. The only time I recall anyone seriously mention something like this was in Ulrich's SHA-crypt.txt, where he wrote: "[...] the produced output for both DES and MD5 has a short length which makes it possible to construct rainbow tables." and "By choosing the SHA-256 and SHA-512 algorithms the produced output is 32 or 64 bytes respectively in size. This fulfills the requirement for a large output set which makes rainbow tables less useful to impossible, at least for the next years." That was nonsense. As you correctly point out, only the input key+salt space (actually, only the sub-space one chose to include in attacks, which would focus on realistically common passwords), and not the output hash space (as long as it's not way too small), affects the size of tables. (And a too-large hash can simply be truncated to reduce the storage requirements for cracking, e.g. QCrack truncated descrypt hashes to 8 bits prior to the advent of rainbow tables, falling back to recomputation on the occasional 8-bit matches.)

zackw commented 6 years ago

As a quick fix (since I think @besser82 may want to get 4.2.0 out the door soon), I have reintroduced the "effective key size" entry for DES only. We now have, for instance,

   yescrypt
       yescrypt is a  scalable  password  hashing  scheme  designed  by  Solar
       Designer, which is based on Colin Percival's scrypt.

       prefix "$y$"

       Encoded passphrase format
              \$y\$[./A-Za-z0-9]+\$[./A-Za-z0-9]{,86}\$[./A-Za-z0-9]{43}

       Maximum password length
              unlimited

       Hash size
              256 bits

       Salt size
              up to 512 bits

       CPU time cost parameter
              1 to 11 (logarithmic)

versus

   Traditional DES-based
       The original hashing method from  Unix  V7,  based  on  the  DES  block
       cipher.   Because  DES  is  cheap on modern hardware, because there are
       only 4096 possible salts and 2**56  possible  hashes,  and  because  it
       truncates  passphrases  to 8 characters, it is feasible to discover any
       passphrase hashed with this method.  It should  only  be  used  if  you
       absolutely  have  to generate hashes that will work on an old operating
       system that supports nothing else.

       prefix "" (empty string)

       Encoded passphrase format
              [./0-9A-Za-z]{13}

       Maximum password length
              8 characters (ignores 8th bit)

       Hash size
              64 bits

       Effective key size
              56 bits

       Salt size
              12 bits

       CPU time cost parameter
              25

Down the road, I have realized that my actual subconscious problem with the terms generated by the .hash macro is that many of them aren't ever defined. What exactly do we mean "hash size N bits", "effective key size M bits", etc? Rather than argue over what the right terms are, I think we need to add a few paragraphs at the top of AVAILABLE HASHING METHODS that define them.

And reading your commentary has revealed to me that I don't understand the password-cracking state of the art well enough to write those definitions. I'm a systems programmer at heart, not a cryptographer. Care to have a whack at it?

solardiz commented 6 years ago

Your changes look good to me, thanks!

As to defining the terms, I think most are self-explanatory whereas "Effective key size" would take too much space to define (more precisely than these words imply) whereas it's only relevant to a couple of old hashing methods, or maybe even to only one - see below. So I'd rather not do that.

As an option, we can leave the only mention of "Effective key size" as:

       Effective key size
              up to 56 bits

for BSDI-style hashes. It's not necessary to include this for "Traditional DES-based", where it's implied by:

       Maximum password length
              8 characters (ignores 8th bit)

Also, we lost the "up to" along with making those changes to the macro and its uses. Perhaps we should restore that since 56 bits is only reached for passwords of length 8+.

zackw commented 7 months ago

Tagging as both "help wanted" and "need more information". I still think we should define the terms we use, but it's still not clear to me how to describe DES, and I still don't understand most of what @solardiz was trying to communicate. I want to find a new contributor who is expert in password hashing and get them to read over the manpage and the discussion history here with fresh eyes, and I want to find a skilled sysadmin who is not expert in password hashing, get them to read the manpage and tell us what parts they understand and what terms they want clarified.

solardiz commented 7 months ago

I think my current corrections in #185 are sufficient. Sure someone (e.g., a sysadmin) could have questions about the exact meaning of "effective key size", but that is not something we can reasonably fully explain in a man page. So I suggest simply merging that PR and then closing this issue.