Suggest standard implementation for electronic share set generation

BenWestgate commented 1 year ago

Summarizes Issue #57:

While codex32 is designed to be computed by hand, should it become popular, it's likely most backups will be made with electronics. This offers opportunities to increase error detection and presents liabilities from trusting the randomness of a device. This proposal seeks to maximize benefits of electronics while reducing their downside by being deterministic and easily auditable so the device's entropy is not a single point of failure.

Generating Shares For a fresh master seed Inputs:

App entropy an extended private masterkey generated by the wallet. It must be displayed unconditionally so the wallet does not know if it will be audited.
User entropy a string provided by the user that allows improving security in case app entropy generator has a backdoor. This could be a one-time text entered by the user only for this purpose, or a passphrase containing enough entropy that is used later for other purposes.
k the threshold to recover the codex32 secret.
n the total number of shares in the backup.

Optional inputs:

Identifier sets a specific 4 character bech32 identifier for the backup, overriding the deterministic one.
Seed length sets a specific master seed byte length, overriding the default 16-bytes.
Fixed share indices forces up to n of the indices to be the ones provided.
Existing codex32 strings forces up to k of the shares to be the ones provided. Overrides seed length, identifier and k.

Outputs:

Fresh master seed, the BIP32 master seed of the backup.
New shares, n newly generated shares with shuffled indices. (excludes existing)

For an existing master seed

Inputs and outputs are the same except: Inputs:
- Existing codex32 strings: one must be the codex32 secret or else a valid threshold length set of shares must be given.
- App entropy is the private masterkey of the existing master seed.
- User entropy becomes optional unless re-sharing a master seed with an existing codex32 share backup. Then it must be a "unique string". The app may help ensure uniqueness with a monotonic counter or date which must be unconditionally displayed.

PROPOSAL

Share indices

In order to prevent the order of indices used from leaking secret information, the available indices (after excluding 's' and any existing shares passed) are deterministically shuffled as follows:

def shuffle_indices(index_seed, indices):
    from cryptography.hazmat.primitives.ciphers import Cipher, algorithms
    """
    Shuffle indices deterministically using provided key with ChaCha20.

    :param index_seed: The ChaCha20 key for deterministic shuffling.
    :param indices: Characters to be shuffled as a string or list.
    :return: List of shuffled characters sorted by assigned values.
    """
    algorithm = algorithms.ChaCha20(index_seed, bytes(16))
    keystream = Cipher(algorithm, mode=None).encryptor()
    counter = 0  # Counter to track current position in the keystream.
    value = b""  # Storage for the assigned random byte.
    block = b""  # Holds the latest keystream block.
    assigned_values = {}  # Dictionary to store chars and their values.
    for char in indices:
        # Ensure new random value is generated if there is a collision.
        while value in assigned_values.values() or not value:
            if not counter % 64:  # Get new 64-byte block per 64 count.
                block = keystream.update(bytes(64))  # ChaCha20 block.
            value = block[counter % 64:counter % 64 + 1]  # Rand byte.
            counter += 1
        assigned_values[char] = value
    return sorted(assigned_values.keys(), key=lambda x: assigned_values[x])

Padding bits

In order to prevent padding bits from leaking 2-4 bits per generated share, these must also be set deterministically. While all zeros works, it's wasteful as some error detection is possible with an ECC code. (potentially bit error correction for high k values.) This ECC padding is guaranteed to detect up to 1 bit error or 2 sequential bit errors:

def ecc_padding(data):
    """
    Calculate and return a byte with concatenated parity bits.

    :param data: Bytes of seed_len, hrp, k, ident, index and payload.
    :return: Byte with concatenated parity in most significant bits.
    """
    # Count mod 2 the number of set (1) bits in the byte data
    parity = bin(int.from_bytes(data)).count('1') % 2 << 7
    # Count mod 2 the number of set (1) even bits in the byte data
    parity += bin(int.from_bytes(data))[::2].count('1') % 2 << 6
    # Count mod 2 the number of set (1) odd bits in the byte data
    parity += bin(int.from_bytes(data))[3::2].count('1') % 2 << 5
    # Count mod 2 the number of set (1) third bits in the byte data
    parity += bin(int.from_bytes(data))[::3].count('1') % 2 << 4

    return parity.to_bytes()

This may help if a share is damaged beyond the correction ability of the codex32 checksum by reducing correction candidates by 1/4 to 1/16. It also checksums string length that BCH checksums can't offer error detection guarantees about.

Identifier

There are two good ideas for setting the default ID so that 1) users are not burdened with creating and entering it, 2) 20-bits of the secret aren't leaked, and 3) it's easy to identify which wallet or descriptor the shares belong to or vice versa.

BIP32 fingerprint in bech32
- Allows matching the first 5 hex characters from descriptor key origin information to the codex32 share identifier to confirm they correspond.
- Allows error correcting these 4 characters by converting a known corresponding fingerprint. Further helping emergency recovery.
- As a cryptographic checksum, it detects malicious tampering of shares or invalid combinations as the derived seed's fingerprint won't match.
Encrypted BIP32 fingerprint in bech32
- Allows selectively revealing the correspondence between shares and descriptors only to those with the unique_string.
- Allows creating unique identifiers for re-sharing that still offer the benefits of fingerprint above if unique_string is remembered or stored.
- Includes k and len(payload) in KDF's salt allowing detection of errors in these with known corresponding BIP32 fingerprint and unique_string.

def encrypt_fingerprint(master_seed, k, unique_string=""):
    """
    Encrypt the MS32 fingerprint using a unique string and header data.

    :param master_seed: The master seed used for BIP32 fingerprint.
    :param k: The threshold parameter as a string.
    :param unique_string: Optional unique string encryption password.
    :return: Encrypted fingerprint as a bech32 string.
    """
    password = bytes(unique_string, "utf")
    salt = len(master_seed).to_bytes(1, "big") + bytes(k, "utf")
    enc_key = convertbits(hashlib.scrypt(
        password, salt=salt, n=2 ** 20, r=8, p=1, maxmem=1025 ** 3, dklen=3),
        8, 5, pad=False)
    new_id = [x ^ y for x, y in zip(ms32_fingerprint(master_seed), enc_key)]
    return "".join([CHARSET[d] for d in new_id])

I prefer using the encrypted fingerprint everywhere for more privacy and its ability to detect errors in k and string length which may help emergency recovery. It's also slower to compute due to KDF so harder to grind data into.

Random share payloads

k shares are generated (less any provided existing codex32 strings). Generating random payloads is the largest potential leak. My proposal has two steps:

User entropy and an extended private masterkey (xprv) from app entropy (or an existing master seed) are fed to a computationally expensive KDF to produce a 128 byte derived_key.
Truncated HMAC-SHA512 is used with key=derived_key and a unique descriptive info field as the msg= to derive the "Index seed" for shuffling as well as each new share payload until a threshold of codex32 strings exist.
Additional derived shares are derived as per the BIP but at shuffled indexes.
If necessary, the master_seed is recovered from k shares as per the BIP and the identifier is updated based on this fresh master seed's fingerprint.

def generate_shares(master_key="", user_entropy="", n=31, k="2", ident="NOID",
                    seed_length=16, existing_codex32_strings=None):
    """
    Generate new codex32 shares from provided or derived entropy.

    :param master_key: BIP32 extended private master key from bitcoind.
    :param user_entropy: User-provided entropy for improved security.
    :param n: Total number of codex32 shares to generate (default: 31).
    :param k: Threshold parameter (default: 2).
    :param ident: Identifier (4 bech32 characters) or 'NOID' (default).
    :param seed_length: Length of seed (16 to 64 bytes, default: 16).
    :param existing_codex32_strings: List of existing codex32 strings.
    :return: Tuple: master_seed (bytes), list of new codex32 shares.
    """
    master_seed = b""
    if existing_codex32_strings is None:
        existing_codex32_strings = []
    new_shares = []
    num_strings = len(existing_codex32_strings)
    if (not validate_codex32_string_list(existing_codex32_strings, False)
            and existing_codex32_strings):
        return None
    available_indices = list(CHARSET)
    for string in existing_codex32_strings:
        k, ident, share_index, payload = decode("ms", string)
        available_indices.remove(share_index)
        if share_index == "s":
            master_seed = payload
        seed_length = len(payload)

    if num_strings == int(k) and not master_seed:
        master_seed = recover_master_seed(existing_codex32_strings)
    if master_seed:
        master_key = BIP32Node.from_rootseed(master_seed, xtype="standard")
    elif master_key:
        master_key = BIP32Node.from_xkey(master_key)
    else:
        return None
    key_identifier = hash_160(master_key.eckey.get_public_key_bytes())
    entropy_header = (seed_length.to_bytes(length=1, byteorder="big")
                      + bytes("ms" + k + ident + "s", "utf") + key_identifier)
    salt = entropy_header + bytes(CHARSET[n] + user_entropy, "utf")
    # This is equivalent to hmac-sha512(b"Bitcoin seed", master_seed).
    password = master_key.eckey.get_secret_bytes() + master_key.chaincode
    # If scrypt absent visit OWASP Password Storage or use pbkdf2_hmac(
    # 'sha512', password, salt, iterations=210_000 * 64, dklen=128)
    derived_key = hashlib.scrypt(
        password, salt=salt, n=2 ** 20, r=8, p=1, maxmem=1025 ** 3, dklen=128)
    index_seed = hmac.digest(derived_key, b"Index seed", "sha512")[:32]
    available_indices.remove("s")
    available_indices = shuffle_indices(index_seed, available_indices)
    tmp_id = "temp" if ident == "NOID" else ident

    # Generate new shares, if necessary, to reach a threshold.
    for i in range(num_strings, int(k)):
        share_index = available_indices.pop()
        info = bytes("Share payload with index: " + share_index, "utf")
        payload = hmac.digest(derived_key, info, "sha512")[:seed_length]
        new_shares.append(encode("ms", k, tmp_id, share_index, payload))
    existing_codex32_strings.extend(new_shares)

    # Derive new shares using ms32_interpolate.
    for i in range(int(k), n):
        fresh_share_index = available_indices.pop()
        new_share = derive_share(existing_codex32_strings, fresh_share_index)
        new_shares.append(new_share)

    # Relabel the new shares with default ID.
    master_seed = recover_master_seed(existing_codex32_strings)
    if ident == "NOID":
        ident = "".join([CHARSET[d] for d in ms32_fingerprint(master_seed)])
        new_shares = relabel_codex32_strings("ms", new_shares, k, ident)

    return master_seed, new_shares

Notes on Key Derivation Function Selection:

It is critical that the KDF used for the derived key and the identifier encryption be computationally expensive for the hardware implementing this standard. Time costs between 2 and 10 seconds with a significant memory cost (relative to available memory) will offer the best protection for low entropy user input against grinding attacks, in acceptable user time.

Scrypt with parameters n=2 20, r=8, p=1* is used above. This requires 1 GB and a couple seconds on a modern laptop. If 1GB of memory is unavailable `pbkdf2_hmac('sha512', password, salt, iterations=210_000 64, dklen=128)` offers similar protection with slower runtime. Using Argon2id with a minimum configuration of 1024 MiB of memory, an iteration count of 4, and 4 degrees of parallelism can offer even better protection than Scrypt and may be implemented by wallets.

I do not prescribe any particular KDF but if using a different KDF or parameters than scrypt with n=2 20, r=8, p=1**, compliant wallet implementations MUST:

Make the KDF function used and its cost parameters be publicly and permanently available for auditors.
Use at least 2048 PBKDF2 iterations of HMAC-SHA512 OR its equivalent.
- See OWASP Password Storage Cheat Sheet for the latest equivalencies.
Not use any fixed cost KDF, hash based message authentication code or simple hash function in place of scrypt.

They also must generally:

Display the app_entropy, if any, unconditionally.
Allow the user to provide at least 128-bits of user_entropy (128+ character minimum, 1024 is a reasonable maximum.) AFTER displaying app entropy.

I welcome everyone's feedback on this proposal. #59

BenWestgate commented 2 months ago

Padding bits

In order to prevent padding bits from leaking 2-4 bits per generated share, these must also be set deterministically. ...some error detection is possible. ... This ECC padding is guaranteed to detect up to 1 bit error or 2 sequential bit errors:
def ecc_padding(data):

In simulations this parity interleaving doesn't perform optimally for 2 random bit errors which seems more likely than 2 bit burst errors since the bech32 charset was designed to differ in 1 bit between confused characters.

Simulation of unique bit flip errors in the payload only:

128-bit payloads:

ecc_padding() Probability of undetected error: 1 error: 0% 2 errors: 43.97% 3 errors: 0% 4 errors: 43.83% 5 errors: 0% 10 errors: 43.74%

CRC-2 (x^2+x+1) Probability of undetected error: 1 error: 0% 2 error: 32.76% 3 error: 22.66% 4 error: 25.54% 5 error: 24.83% 10 errors: 24.99%

For 256-bit payloads: CRC-4-ITU (x^4+x+1) Probability of undetected error: 1 error: 0% 2 errors: 6.36% 3 errors: 6.27% 4 errors: 6.30% 5 errors: 6.22% 10 errors: 6.21%

ecc_padding() Probability of undetected error: 1 error: 0% 2 errors: 27.42% 3 errors: 0% 4 errors: 25.26% 5 errors: 0% 10 errors: 25.02%

For 512-bit long payloads: CRC-3-GSM (x^3+x+1) 1 error: 0% 2 errors: 14.18% 3 errors: 12.25% 4 error: 12.55% 5 error: 12.48% 10 errors: 12.50%

ecc_padding() Probability of undetected error: 1 error: 0% 2 errors: 38.73% 3 errors: 0% 4 errors: 37.62% 5 errors: 0% 10 errors: 37.50%

Unless you know of a reason why bit errors are more likely to arrive in odd amounts (after 1), CRC codes are far superior to last year's idea.

This would be applied to the codex32 secret's padding.

It could be applied to k - 1 shares but the bits need to be obfuscated by xor with a pubkey hash or it's trivial to tell they are the first k - 1 shares by verifying the CRC bits, denting n privacy. This allows using the CRC bits to detect errors and correct erasures on the "special" shares when BIP32 fingerprint is known, while hiding they are the first k - 1 shares from a snoop without the full fingerprint.

apoelstra commented 2 months ago

Unless you know of a reason why bit errors are more likely to arrive in odd amounts (after 1), CRC codes are far superior to last year's idea.

This would be applied to the codex32 secret's padding.

It could be applied to k - 1 shares but the bits need to be obfuscated by xor with a pubkey hash or it's trivial to tell they are the first k - 1 shares by verifying the CRC bits, denting n privacy.

This all sounds good to me. It feels a little overengineered for what it, but I've said that before :). And it would help list decoders zero in on the correct decoding in cases where there are more than 4 errors. So the more correction ability the better.

BenWestgate commented 2 months ago

OK, I built it. Prepending the fingerprint to the data before calculating the CRC doesn't seem to hurt the error detection, even when the ID is its first 20 bits. And does blind the CRC from an observer without the other 12 bits.

I also wrapped the hrp into the calculation because why not.

def bin_polymod(chk_len, data):
    """Internal function that computes the CRC checksum."""
    # Define the CRC polynomial (x^chk_len + x + 1)
    polynomial = (0b1 << chk_len) + 0b11
    crc = 0
    for bit in data:
        crc = (crc << 1) | int(bit) # Shift in the next bit
        # If the leftmost bit (MSB) is 1, XOR with the polynomial
        if crc & (0b1 << chk_len):
            crc ^= polynomial
    # Return the last chk_len bits as the CRC
    return crc & (2 ** chk_len - 1)

def bin_hrp_fp_expand(hrp, fp):
    """Expand HRP and fingerprint into values for crc computation."""
    return convertbits(fp + bytes(hrp,'utf'), 8, 1)

def bin_create_crc(hrp, data, fp=b''):
    """Compute the CRC checksum values given data and fingerprint."""
    chk_len = 5 - len(data) % 5
    values = bin_hrp_fp_expand(hrp, fp) + data
    polymod = bin_polymod(chk_len, values + [0] * chk_len)
    return convertbits([polymod], chk_len, 1)

BenWestgate commented 2 months ago

OK, I built it. Prepending the fingerprint to the data before calculating the CRC doesn't seem to hurt the error detection, even when the ID is its first 20 bits. And does blind the CRC from an observer without the other 12 bits.

Huge breakthrough realization:

If share indexes are cryptographically secure shuffled, they contain nearly 5-bits entropy per share for low t or n. We only need 128-bits for security. This leaves us an ECC budget of 7-bits per 135-bits of index + payload + padding.

So let's generate the share index and padding from the payload bytes with a CRC-7 (no idea which is best yet, using x^7 + x^3 + 1 now. Any suggestions what I should optimize for is appreciated. This one detects 99.99% of random 2-bit errors, and misses more at 1/128 chance)

at t=2, an attacker with k-1 shares has learned 0.09 bits ~= 5 - log2(30) about the next share's payload. at t=3 an attacker with k-1 shares has learned 0.14 bits ~= 5 - log2(29) about the next share's payload. at t=9 an attacker with k-1 shares attacker learned 0.48 bits ~= 5 - log2(23) about the next share's payload.

This doesn't meaningfully impact security, the payload is over provisioned and contains 130 unknown bits tolerating some loss.

This is NOT secure to apply to an existing 16-byte seed, since its codex32 secret has a known index and 0-bits of entropy in that position. The attacker knows the index is 's', getting a 5-bit head start on the unknown 130 bits, reducing overall security to 125-bits, insufficient for BIP32 requirements.

For derived shares, this checksum will usually not verify (1/128 chance). But for n < 20 the generated shares (k or k-1 if starting with an existing master seed) can be re-rolled with HMAC-SHA512(entropy, index)[:seed_len]until they do. The checksum is extremely fast to verify so it is practical to generate large n codex32 backups where every share has a verifiable CRC-7 on a PC and 5-10 verifiable shares should be doable on some HWWs.

Relevant polynomials: 128 bit seeds: https://users.ece.cmu.edu/~koopman/crc/crc7.html 256 bit seeds: https://users.ece.cmu.edu/~koopman/crc/crc9.html 512 bit seeds: https://users.ece.cmu.edu/~koopman/crc/crc8.html

apoelstra commented 2 months ago

Verrry interesting. At first blush your analysis looks correct (it starts to breakdown for high t and n, as you say, but not catastrophically ... if you have 16 outstanding shares you lose less than a bit here).

BenWestgate commented 2 months ago

(it starts to breakdown for high t and n, as you say, but not catastrophically ... if you have 16 outstanding shares you lose less than a bit here).

If attacker has 16 shares he already recovered.

At Worst, k - 1 = 8 indexes eliminated tells him 0.48 bits about the final share payload. But payload is 130-bits so there's still 129.5 bits security.

If you make indexes public, it reduces security to 125-bits at T -1. Insufficient for BIP32 on 16-byte seeds.

The "header" in this design should be called the first 5 characters, not 6 as share index is secret data derived from its payload. It's safe for 17+ byte seeds to make the index public.

You could KDF each payloads' bytes to choose its share index but that destroys error correction and ability to find valid derived shares.

The best sub-sharing idea is encoding a share's 45 bech32 symbols as the codex32 secret payload for an outer codex32 backup. So that can encrypt the now secret index for sub-shares.

BenWestgate commented 2 months ago

Besides more error correction for codex32 shares, this spares 5 bits for compact QRs. They now can support all 9 thresholds and scanning one displays the threshold and first ID character immediately.

There's more room but best Idea I've had is 3 more bits of the ID just for t=3 which could sometimes fill a second character of the ID after 2 scans. Could get it to 4 bits (~90%) if dropped t>3 from compact, and maybe 5 if I used numeric encoding for threshold 2 and alphanumeric for threshold 3.

BenWestgate commented 1 month ago

Closing this issue because it's a duplicate of #57 and I dislike like most of the code snippets.

Supporting compact CodexQRs actually required things to be much simpler. Currently writing a generate_shares() function that doesn't make me cry to look at.

Also axed the idea of a 25x25 Compact CodexQR for 256-bit seeds. Not enough space in v2, too many compromises.

BlockstreamResearch / codex32

Suggest standard implementation for electronic share set generation #64