Closed solardiz closed 1 month ago
Check my logic here: Traditional DES recognizes only 256 distinct input passwords. (This is captured by the "maximum password length" entry in the table.) Those passwords are translated directly to DES cipher keys. The input block is fixed. It follows that, for any one salt, traditional DES crypt has only 256 possible outputs, even though its output is 64 bits long. If there is an output collision, that implies a weakness in the DES block cipher primitive (two keys produce the same output for the same input). The practical consequence of this is that a complete rainbow table for traditional DES requires 256 × 212 entries.
BSDI enlarges the salt, but uses an iterated Merkle-Dåmgard-ish construction to map unlimited-length passphrases to the 256 possible DES keys, which means that for each passphrase longer than 8 characters, there is a passphrase of no more than 8 characters that collides with it (although the short side of the collision has a good chance of being untypeable). A complete BSDI rainbow table would have 256 × 224 × 223 entries (the additional factor comes from the variable round count).
bigcrypt does somewhat the opposite: the salt stays 12 bits long, but the output length is enlarged proportional to the length of the input, using a construction reminiscent of CBC mode (each block of hash provides the salt for the next eight characters of password). This means brute-force guessing is parallelizable (each block can be cracked independently) and it also means a complete rainbow table for bigcrypt needs only 256 × 212 entries even though the output could contain many more than 56 significant bits. You would simply look up each block independently.
Gah, accidentally hit the close button while trying to edit.
Now, the question in my mind is, what are we trying to document here? Terminology aside, I think it is simply that, for the DES variants and only the DES variants, the output is longer than required for the number of possible outputs, and therefore brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file.
With me so far?
While "what are we trying to document here?" is an important question, whatever we document better be correct - and right now it is not, hence this issue.
I disagree that "for the DES variants [...] the output is longer than required for the number of possible outputs". They would have been somewhat worse if their output size were 56 rather than 64 bits. First, given a fixed salt, a 56-to-56 hash would produce only ~63.2% (or 1-1/e) of the 2^56 different values, whereas a 56-to-64 hash would produce ~99.8% of them. Second, with salts the input space is 56+12 bits, which is larger than 64.
Now, I agree the salts helping utilize more of the output space, and occasional hash value collisions, are both mostly irrelevant to many realistic password cracking attacks, such as with tables. (BTW, I'd rather not go into discussing rainbow tables here - they're complicated "compressing" data structures (where the input number of "entries" is typically orders of magnitude higher than what's recorded), and they're also probabilistic. For the purpose of this discussion and to have number of "entries" more precisely defined, I'll assume you meant simpler lookup tables.)
I agree that for attacks the input key space matters more than the output hash size does. (And distribution of actual passwords over that space matters even more.)
However, we should and we do have separate fields for these, and we need to put the correct information in them. We shouldn't confuse input key size or its pipe width with output hash size. That's an artificial over-simplification, and it's incorrect.
Regarding "brute-force attacks are easier / rainbow tables are smaller than you might think from inspecting a leaked shadow file", only someone unfamiliar with password cracking would try to draw conclusions about those things from hash sizes. The only time I recall anyone seriously mention something like this was in Ulrich's SHA-crypt.txt
, where he wrote: "[...] the produced output for both DES and MD5 has a short length which makes it possible to construct rainbow tables." and "By choosing the SHA-256 and SHA-512 algorithms the produced output is 32 or 64 bytes respectively in size. This fulfills the requirement for a large output set which makes rainbow tables less useful to impossible, at least for the next years." That was nonsense. As you correctly point out, only the input key+salt space (actually, only the sub-space one chose to include in attacks, which would focus on realistically common passwords), and not the output hash space (as long as it's not way too small), affects the size of tables. (And a too-large hash can simply be truncated to reduce the storage requirements for cracking, e.g. QCrack truncated descrypt hashes to 8 bits prior to the advent of rainbow tables, falling back to recomputation on the occasional 8-bit matches.)
As a quick fix (since I think @besser82 may want to get 4.2.0 out the door soon), I have reintroduced the "effective key size" entry for DES only. We now have, for instance,
yescrypt
yescrypt is a scalable password hashing scheme designed by Solar
Designer, which is based on Colin Percival's scrypt.
prefix "$y$"
Encoded passphrase format
\$y\$[./A-Za-z0-9]+\$[./A-Za-z0-9]{,86}\$[./A-Za-z0-9]{43}
Maximum password length
unlimited
Hash size
256 bits
Salt size
up to 512 bits
CPU time cost parameter
1 to 11 (logarithmic)
versus
Traditional DES-based
The original hashing method from Unix V7, based on the DES block
cipher. Because DES is cheap on modern hardware, because there are
only 4096 possible salts and 2**56 possible hashes, and because it
truncates passphrases to 8 characters, it is feasible to discover any
passphrase hashed with this method. It should only be used if you
absolutely have to generate hashes that will work on an old operating
system that supports nothing else.
prefix "" (empty string)
Encoded passphrase format
[./0-9A-Za-z]{13}
Maximum password length
8 characters (ignores 8th bit)
Hash size
64 bits
Effective key size
56 bits
Salt size
12 bits
CPU time cost parameter
25
Down the road, I have realized that my actual subconscious problem with the terms generated by the .hash
macro is that many of them aren't ever defined. What exactly do we mean "hash size N bits", "effective key size M bits", etc? Rather than argue over what the right terms are, I think we need to add a few paragraphs at the top of AVAILABLE HASHING METHODS that define them.
And reading your commentary has revealed to me that I don't understand the password-cracking state of the art well enough to write those definitions. I'm a systems programmer at heart, not a cryptographer. Care to have a whack at it?
Your changes look good to me, thanks!
As to defining the terms, I think most are self-explanatory whereas "Effective key size" would take too much space to define (more precisely than these words imply) whereas it's only relevant to a couple of old hashing methods, or maybe even to only one - see below. So I'd rather not do that.
As an option, we can leave the only mention of "Effective key size" as:
Effective key size
up to 56 bits
for BSDI-style hashes. It's not necessary to include this for "Traditional DES-based", where it's implied by:
Maximum password length
8 characters (ignores 8th bit)
Also, we lost the "up to" along with making those changes to the macro and its uses. Perhaps we should restore that since 56 bits is only reached for passwords of length 8+.
Tagging as both "help wanted" and "need more information". I still think we should define the terms we use, but it's still not clear to me how to describe DES, and I still don't understand most of what @solardiz was trying to communicate. I want to find a new contributor who is expert in password hashing and get them to read over the manpage and the discussion history here with fresh eyes, and I want to find a skilled sysadmin who is not expert in password hashing, get them to read the manpage and tell us what parts they understand and what terms they want clarified.
I think my current corrections in #185 are sufficient. Sure someone (e.g., a sysadmin) could have questions about the exact meaning of "effective key size", but that is not something we can reasonably fully explain in a man page. So I suggest simply merging that PR and then closing this issue.
As discussed in https://github.com/besser82/libxcrypt/pull/27#discussion_r213801730, the
.hash
macro has been changed somewhere between my revision in Owl (and its later life in ALT Linux?) and libxcrypt's to change the reporting for traditional DES-based hashes from:to:
(There are other differences as well, but those look OK.)
The above change is wrong because a limit on key size doesn't reduce the effective hash size as claimed.
First, a cryptographic 56-bit key to 56-bit hash function would likely produce far more collisions (over its full 56-bit key space) than a cryptographic 56-bit to 64-bit hash function is expected to. If this isn't immediately obvious, please consider extreme cases such as a 1-bit to 1-bit vs. a 1-bit to 64-bit hash function (in the former case, 2 out of 4 possible functions would produce collisions half of the time, whereas in the latter an extremely low percentage of the possible functions would produce collisions), or an 8-bit to 8-bit (expect only ~162 different outputs over the 256 input keys space) vs. an 8-bit to 64-bit hash function (expect no collisions).
Second, even with the key input limited to 56 bits, almost the full 2^64 variety of 64-bit hash outputs are likely reachable via varying key+salt inputs.
The same issue is probably also present for BSDI-style DES-based hashes, where the password input length is not limited to 8 characters, but internally the key size is effectively limited to 56 bits, creating a "narrow pipe" within the hash function (except for the salt input, which is also treated separately). This "narrow pipe" is similarly not the same as an effectively reduced hash output size.
Perhaps this change to
.hash
should be reverted or revised, and some uses of.hash
may need to be revised accordingly. I understand that the lines:as would be reported by my original
.hash
for most hash types didn't look pretty. Maybe we can have the macro omit this reporting for hashes where this is the case, only leaving the reporting for the few special-case hashes?