Recommendations for Auditable Electronic Codex32 Implementations

BlockstreamResearch / codex32

A paper computer for Shamir's Secret Sharing over the Bech32 alphabet.

80 stars 23 forks source link

Recommendations for Auditable Electronic Codex32 Implementations #57

Open BenWestgate opened 1 year ago

BenWestgate commented 1 year ago

While intended for paper computers, some benefits of codex32 can be maximized through electronic implementations. To enhance this aspect, I propose recommendations for creating auditable codex32 share sets when using electronics.

Inspired by the concept of auditable bitcoin wallets, I have developed an implementation of For a fresh master seed, which ensures full determinism in generating the codex32 share set.

I believe that deterministically choosing every aspect of the codex32 share set enhances protection against potential malicious tampering of shares by detecting any deviation from the determinism.

In issue #54, we discussed that the hash160(public_masterkey) from descriptors serves as a useful deterministic data to use as the default identifier. However, we need to address another aspect that could potentially leak information – the share indices and payloads of the k - 1 independent shares.

Currently, the BIP recommends selecting the next available letter from the bech32 alphabet, in alphabetical order, as share indices:

Take the next available letter from the bech32 alphabet, in alphabetical order, as a, c, d, ..., to be the share index

This approach may inadvertently reduce privacy, as finding share f, g, or h would reveal a backup of at least 5, 6, or 7 shares, respectively. To improve this, I propose an update to the BIP where, during hand creation, dice are rolled to select each new share index. This method is akin to how indexes are created in software like libgfshare-bin.

For electronic implementations, share indices should be determined by the codex32 secret alone. Below is an example of how I chose the share indices in Python:

import random
import hashlib
import hmac
salt=threshold+identifier+str(len(master_seed))
random.seed(a=hmac.digest(master_seed, salt, 'sha512'))
share_index = random.sample(indices_free, n)

The master_seed, threshold, identifier, and seed_length are the only variables in a codex32 secret, so I HMAC these together to seed the selection of share indices. Although a hash of the codex32 secret string would also work.

I think this is the right track but I'm looking to standardize recommendations for electronic implementations. If there's a better way please say so.

Below is an example of how I generated the share payloads in Python:

share_entropy = hashlib.scrypt(password=master_seed, salt=salt, n=2**20, r=8, p=1, maxmem=1025**3, dklen=(int(k)-1)*(seed_length+1))

While this a start, it may not be suitable as a standard recommendation due to scrypt parameters that won't run on hardware wallets. To address this, I am open to exploring alternative variable length secure hash functions. Your input on this matter would be appreciated.

Next, this stretched entropy is divided evenly into k - 1 share payloads. Additionally, an extra byte is allocated for filling the padding bits on the 26th or 52th characters.

Subsequently, the remaining n + 1 - k shares are derived according to the BIP:

    for x in range(int(k) - 1):
        data = list(share_entropy[x*(seed_length+1):x*(seed_length+1)+seed_length+1])
        share_list += [encode("ms", seed_length, k, identifier, share_index[x], data)]
    for x in range(int(k) - 1, n):
        derived_share_list += [derive_new_share(share_list,share_index[x])]

Lastly, if a new set of shares is created for an existing master_seed with the same threshold, I propose incrementing the last character of the identifier by 1, which will cause a completely new set of indicies and share payloads to be chosen.

I welcome any feedback or suggestions for improving the approach. Let's establish standardized and auditable recommendations for electronic codex32 implementations. Thank you for your collaboration.

apoelstra commented 1 year ago

To improve this, I propose an update to the BIP where, during hand creation, dice are rolled to select each new share index.

This would actually complicate things a fair bit -- generating shares in order allows us to have a fixed set of tables that users can use to derive additional shares. If you were generating shares in random order you would need to do something akin to the recovery process -- and for each share index we'd need a "recovery wheel" for that index.

I wouldn't mind releasing a set of "recovery wheels" and suggesting people do this as an additional privacy measure (we've wanted to do this anyway as a "recover the remaining shares from an incomplete set" process), but I think it's too onerous to suggest doing as the default. Especially as the k/n parameters are not something that we think of as secret data. k is even encoded in the share header.

But I agree that this generation mode be the default for electronic implementations.

Below is an example of how I generated the share payloads in Python:

Because you're starting with the master seed, I don't think there's any value in hardening this. If the seed itself is of low-quality randomness then the user is screwed anyway. So I would actually suggest going in the other direction, toward chacha20 rather than scrypt, in the hope of coming up with something computationally simple enough that it could be (theoretically) audited by hand.

Similarly, with the share indices I don't even think we need cryptographic randomness -- the goal is simply to break the implication "share G implies that shares B,C,D,E,F exist". Though of course, if you are using the master seed as entropy, you want to run that through a cryptographic RNG to make sure that no information leaks into the indices themselves. I also don't think the share indices need to be auditable, though there's no reason for them not to be so we might as well.

BTW, I've been holding off on PR'ing to the BIP repo because it seems like you (and us) are still iterating on this, but I definitely do intend to incorporate your suggestions once we've nailed everything down. Thanks a ton for digging into this!

apoelstra commented 1 year ago

@BenWestgate also, if you are on IRC, #volvelle-wizards is our discussion channel for this stuff.

BenWestgate commented 1 year ago

Thank you for your response I will make changes to my implementation to make it closer to a standard recommendation.

I will get rid of the KDF and use ChaCha20 for filling the secret payload padding, the share payloads and shuffling the indicies so auditing electronic codex32 share sets by hand is possible.

Share indices MUST be derived by a cryptographically secure randomness if not fixed, and we agree they should shouldn't be fixed by default in electronic implementations for the small "n" privacy gain.

By my math if the user made 31 shares and numbered them 1 to 31 it can leak a surprising amount of data: 87 bits. So even with encrypted payloads, the seed would be effectively leaked by a malicious non-deterministic implementation with enough indices known. Although maybe 2nd level SSS encryption should encrypt the plaintext share's threshold, identifier, and indices for reasons like this. #16

Without encryption the max threshold is 9, so the maximum data leak we're concerned with is from 8 indices log2(31*30*29*28*27*26*25*24) = 39 -bits leaked

Leaving only 89-bits of security for the master_seed, which is probably guessable by TLAs in our lifetime if not already.

Regarding the algorithm for the share index selection, using python random's random.seed() and then random.sample() is probably not hand auditable due to the Mersenne Twister.

Instead I could get at least 87-bits from CHACHA20, convert to bech32, and take the first unique character to be the next share index.

But how to know how much data to produce from the hash so I don't run out? Is there a better way to deterministically "shuffle" a list by hand computations?

Use Chacha20 with a secret key (master_seed) to generate a keystream.

Take the first byte of the keystream (or any desired number of bits, depending on the number of elements in the list) and use it as an index to select the first symbol in the shuffled list, unless the value is "s" or a previously selected character. 3.Use the next byte of the keystream (or any desired number of bits) to use as an index to select the next symbol for the shuffled list

Increment the counter for Chacha20 by 1 if out of bytes.

Repeat steps 2 to 4 until all symbols are selected, creating the shuffled list.

Perhaps the chacha20 hash mod 31, mod 30, ... etc What is the right balance between length of secure hash output needed and math operations to derive each index indices that's most efficient for hand auditing and also doesn't introduce modulo bias?

The share payloads are much easier in this regard as they allow repeating characters and use the full 5 bits per character so the chacha20 output stream can be directly used.

I joined the IRC channel as benwestgate.

BenWestgate commented 1 year ago

Because you're starting with the master seed, I don't think there's any value in hardening this. If the seed itself is of low-quality randomness then the user is screwed anyway. So I would actually suggest going in the other direction, toward chacha20 rather than scrypt, in the hope of coming up with something computationally simple enough that it could be (theoretically) audited by hand.

from Cryptodome.Cipher import ChaCha20

padding _length = 32 - len(master_seed)

key = master_seed + bytes(padding_length)

nonce = bytes(hrp+k+identifier+"s", 'utf')

cipher = ChaCha20.new(key=key, nonce=nonce)

cipher_stream = cipher.encrypt(bytes(8))

Does this look good as the RNG stream for setting the padding bits in the last symbol of the codex32 secret payload, the share indicies and k-1 share payloads?

The nonce can be incremented by 1 whenever a 512-bit block is exhausted.

Do you prefer zero padding the decoded master_seed bytes as shown above or directly utf-8 encoding the 32 bech32 characters after "s", ie key = bytes(codex32_secret[9:41], 'utf-8')?

Unsure which is easier by hand.

Sad to see double_sha256 go as the padding as 256-bit master_seed's were direct truncation of WIF serialization. But that novelty is a liability for auditing by hand.

Likewise, it feels wrong to set the secret payload padding bits with a RNG stream that depends on the header, causing the payload for identical master_seed's to vary if the header changes.

Yet it seems smart to conserve labor for hand auditors by avoiding computing a fixed nonce ChaCha20 block only to set 2-4 padding bits then discard. These bits should be the first data extracted from the stream as a fast first check for malicious tampering. The next should be the share payloads, as these are also recoverable error detecting and auditable data, the last should be the indices as the order the software provided them is meant to be lost during recovery and so they cannot detect tampering.

apoelstra commented 1 year ago

Share indices MUST be derived by a cryptographically secure randomness if not fixed, and we agree they should shouldn't be fixed by default in electronic implementations for the small "n" privacy gain.

By my math if the user made 31 shares and numbered them 1 to 31 it can leak a surprising amount of data: 87 bits.

Heh, yeah, this seems reasonable. It seems pretty implausible that the user would number their shares 1 to 31 like that but why have a leak when we don't need to. If users want to do this by hand they can use dice rolls, which are cryptographically secure.

Regarding the algorithm for the share index selection, using python random's random.seed() and then random.sample() is probably not hand auditable due to the Mersenne Twister.

Hmm, true. And I see now the value of auditability, since in theory each share index could be leaking 5 bits (to an attacker who knows the order that the shares was generated anyway).

But how to know how much data to produce from the hash so I don't run out? Is there a better way to deterministically "shuffle" a list by hand computations?

Knuth's "Algorithm P" can do this. (The question is about finding a uniform subset, but conveniently this algorithm finds a subset in a uniform order. If you want a sorted order, which is just as good from a security POV, you can use "Algorithm S" or just sort manually.) Howevery, algorithm P assumes you can uniformly sample from the range [i, 31] for each i, which is hard to do without a theoretically indefinite stream of randomness. Though if you use rejection sampling, the probabilities are at least much easier to compute.

Let me think about this.

Does this look good as the RNG stream for setting the padding bits in the last symbol of the codex32 secret payload, the share indicies and k-1 share payloads?

The RNG initialization looks good to me, although

I guess identifier comes out of a different RNG?.
I also don't know what encrypt() does; I assume it xors the input with the random stream.
It's a cool idea to set the nonce uniquely per seed; I'd have probably just left it at all-zeroes...see below comments.

Also, I wouldn't bother setting the padding bits randomly. You can just leave them as 0. They aren't used and do not feed into BIP32 derivations.

The nonce can be incremented by 1 whenever a 512-bit block is exhausted.

Yep, though this is a tiny bit subtle. If you were to produce 256 blocks then you'd be incrementing the identifier part of the nonce, and would produce the same output as you would've if you'd simply started with that identifier.

But since we're only producing 31 shares, each of which only takes half a block, this is fine.

Do you prefer zero padding the decoded master_seed bytes as shown above or directly utf-8 encoding the 32 bech32 characters after "s", ie key = bytes(codex32_secret[9:41], 'utf-8')?

Unsure which is easier by hand.

Directly using the bytes is definitely easier, and I think is also more elegant.

Sad to see double_sha256 go as the padding as 256-bit master_seed's were direct truncation of WIF serialization. But that novelty is a liability for auditing by hand.

Yeah, that is a cool property. Though TBH I don't think users frequently see WIF serialization of xprivs so it's something of a curiosity.

Likewise, it feels wrong to set the secret payload padding bits with a RNG stream that depends on the header, causing the payload for identical master_seed's to vary if the header changes.

Yeah, hmm. There's two ways you could think of this:

Changing the threshold or identifier is a "reset the world" change and we should put this stuff into the RNG. This prevents users from re-sharing the same secret in different contexts while producing the same shares, which is good because the different share sets would interact algebraically in possibly surprising ways.
Changing the threshold or identifier is a "relabelling" which we ought to support. I actually considered producing a worksheet for this because in the early days I found myself relabelling stuff a lot (to get encrypted shares with distinct but human-readable headers).

The threshold value should definitely go into the RNG. Under no circumstances should you "relabel" the threshold, unless you are doing something dangerous and way off-script in which case the spec doesn't matter anyway. After thinking about it for a bit, I also think we should do this with the identifier. It creates a nice workflow where "if you want to re-split the secret, you have to change the identifier" and conversely "if the identifiers are the same (and the they come from the same seed), they're compatible shares".

Re-sharing secrets is a safe thing to do as long as the initial shares are independently random, and there are lots of reasons that users might want to do it.

BenWestgate commented 1 year ago

I guess identifier comes out of a different RNG?.

https://github.com/BlockstreamResearch/codex32/issues/54#issuecomment-1650573709

identifier is either selected by the user or defaults to the hash160(master_pubkey), making it possible to tell which share set recovers spending a watch only wallet, as it is prepended to child xpubs in descriptors. Of various options this one seems most useful for these 4 characters.

I also don't know what encrypt() does; I assume it xors the input with the random stream.

That's exactly what it does, the stream is being XOR'd with zeros in order to print the cipher stream directly.

It's a cool idea to set the nonce uniquely per seed; I'd have probably just left it at all-zeroes...see below comments.

It is an essential feature.

Having the nonce be a function of the header allows for creating new backups for the same master_seed that don't reuse the original randomness. For example if a share or two was compromised, the old set could be destroyed and replaced with a share set using a new header (new randomness) to restore security without sweeping funds. (Whether it's advisable to "rotate" shares given an attacker may reach a threshold before you've Completely Destroyed the partially-compromised share set is another question.)

If the nonce was fixed to bytes(8), you'd get the same independent shares for the same master_seed every time, even if you changed the threshold or identifier. This even helps with access control, if you want one group to be able to recover at threshold 3 and another group to recover at threshold 2, make two share sets with each threshold and one set will NOT leak payload data of the other set.

apoelstra commented 1 year ago

It is an essential feature.

This is a great argument. Agreed on all counts.

BenWestgate commented 1 year ago

Also, I wouldn't bother setting the padding bits randomly. You can just leave them as 0. They aren't used and do not feed into BIP32 derivations.

I dislike wasted space, especially in handwriting. But I agree padding should not be set randomly. Let's see how to make those bits do some work:

These bits are as secret as the secret, it's a waste to set them with a non-linear secure hash function. double_sha256(seed) etc with no efficient erasure correction ability.
These bits are as secret as the secret, it's a waste to set them to some public data like the hash160(master_pubkey) as you can't see them until you can already compute the pubkey, this is just another form of 1 with extra wasted work.
But they could be parity bits or linear binary error coding. These algorithms are pretty fast to do on the master_seed bytes by hand.

Do you have any suggestions?

Naive idea is upper half and lower half parity, which has nice property of keeping the same 64-bit chunk length for 128/256-bit seeds. But falls apart at 512-bit seeds which have 3-bits of padding (170 bit chunks?) This is not the most efficient error coding. Proper ECC should correct contiguous bit erasures anywhere in the master_seed that are the length of the checksum and is what should be used. This offers a little extra help in the absolute worst case scenario, where you have threshold shares but some are invalid beyond correction by the codex32 BCH code but are less than a symbol away from getting the right correction, potentially making brute forcing your valid codex32 secret 4 to 16 times cheaper.

What is the best ECC to put here? Will it wastefully repeat the same info as the codex32 secret checksum or can it supplement it?

Agreed, regarding re-sharing the same secret:

The BIP already says only shares with a matching identifier can be combined. Allowing "relabeling" is not very useful because it changes the checksum, even if share payloads do not change. And then you'd create a situation that the BIP says is not supported: two shares of different identifiers can be combined, but only if you ignore their checksum and identifier and just combine payloads. It's neat to just increment the LSB of the identifier to re-share an existing master_seed, then it still retains 3 symbols of the master_pubkey fingerprint, while allowing 32 re-shares. And setting a custom identifier to reshare should also be supported as long as it's unique from all previously used identifiers at that threshold.

On this topic of identifier uniqueness for re-sharing, it's not clear a user would know all previously used identifiers besides the one they have just combined with to do a reshare. Especially if they are not destroying old sets, they may combine with an older share set, and if we naively increment that, we will re-use the same nonce! So I think the default for "re-shares" needs to generate some independent randomness when users don't know all outstanding identifiers for this seed and threshold. Then we have a birthday problem and just pray for no collisions... The alternative is to not support useful cases:

Can you think of any reason to have 2 outstanding share sets of the same threshold but different identifiers? I did: prevent one set from combining with another set to defend against custodian collusion.

I might make a 2-of-3 with my dad and executor and a 2-of-3 with my mom and executor, each with unique identifier of same master_seed, and now recovering the master_seed requires me or my executor and 1 parent, but both my parents cannot combine shares to recover without me or my executor. Making them less attractive targets for a physical attack than if this was a 2-of-4 between my parents, myself and my executor. Useful.

How much randomness do we have to inject into the identifier if we don't know all the old identifiers to be pretty safe we don't reuse one? Can we still keep some bech32 encoding of the public key fingerprint, to identify the underlying wallet master_seed and connection between unique share sets of it or must inject all the entropy we can to be safe?

apoelstra commented 1 year ago

I dislike wasted space, especially in handwriting. But I agree padding should not be set randomly. Let's see how to make those bits do some work:

This is an interesting direction. We can definitely stick a checksum in here and we can definitely make it "independent" of the existing checksum in the sense that it will meaningfully add error correction. But it will be a bit-level error correcting code which has some unintuitive properties when applied to data that's normally encoded as 5-bit characters.

In particular I think a 2-bit code will only be able to detect one error (I guess, two, if they're consecutive), so it would mean we could detect certain character errors but not others. This is not zero value but it's fairly limited. I think we should keep thinking whether there's a better use of 2 bits before committing to this.

Allowing "relabeling" is not very useful because it changes the checksum, even if share payloads do not change.

It does, but you can compute what the checksum will change to, far more quickly than recomputing the whole thing. You can even use a computer to do most of the work without exposing any secret data to the computer.

On this topic of identifier uniqueness for re-sharing, it's not clear a user would know all previously used identifiers besides the one they have just combined with to do a reshare. Especially if they are not destroying old sets, they may combine with an older share set, and if we naively increment that, we will re-use the same nonce! So I think the default for "re-shares" needs to generate some independent randomness when users don't know all outstanding identifiers for this seed and threshold. Then we have a birthday problem and just pray for no collisions... The alternative is to not support useful cases:

Agreed. Unfortunately I think we just need to live with the birthday problem. In practice I think it'd be okay, both because the chance of collisions is fairly low (you need close to 1024 re-shares before the probabilites get high, and like, what are you doing with your secrets then..) and because the failure mode is that you'll generate exactly the same share set as before. Which is not much worse than having the original set still out there.

Can you think of any reason to have 2 outstanding share sets of the same threshold but different identifiers? I did: prevent one set from combining with another set to defend against custodian collusion.

Yeah, this is sorta what I'm imagining. That, or having "backup shares" that might have the same/overlapping custodian sets, but nonetheless shouldn't be mixable. Unfortunately these are the exact cases where collisions would undermine your security model.

How much randomness do we have to inject into the identifier if we don't know all the old identifiers to be pretty safe we don't reuse one? Can we still keep some bech32 encoding of the public key fingerprint, to identify the underlying wallet master_seed and connection between unique share sets of it or must inject all the entropy we can to be safe?

The birthday paradox shows up at around sqrt(N). So if you used the entire 20-bit identifier you'd expect collisions around the time you had 10 outstanding bits (1024). The curve is pretty steep so you have very low probabilities for quite a while, so it's fine to use 1000 as a rough estimate. If you used only half the identifier, 10 bits, you'd expect collisions after around 30 ... and have nontrivial probability of collisions after only a few.

My feeling is that you'd need to use the entire identifier for this.

BenWestgate commented 1 year ago

I think using the entire identifier is fine. This is a sub-case of the support custom identifier case, where we MUST randomize it when the user and software can't recall all previously generated share set identifiers for this master_seed.

Otherwise implementations can default to suggesting to increment it, preserving most of the hash160(master_pubkey) fingerprint or as always the user can pick anything unique: "faml" "moma" "dada" "lawr" "acct" "aunt" etc. Wallets should store the identifiers they have created to prevent the user from repetition or at least warn in event it is deliberate to repair or extend an existing share set.

Since this "I forgot" case requires new randomness that cannot be derived from master_seed the following procedure from auditable bitcoin wallets is needed so that the new random identifier can't leak seed data or deliberately repeat an identifier to break security:

identifier = convertbits(hashlib.scrypt(password=user_data, salt=app_entropy, n=2**20, r=8, p=1, maxmem=1025**3, dklen=3), 8, 5)[:4]

These parameters will need to be reduced to support HWWs and hand auditing (see below)

Where the user provides randomness and the machine generates 20-bits and unconditionally displays it and they are mixed with a one-way compression function.

Process of creating the new identifier should be functionally equivalent to the following steps:

Wallet generates a pseudo-random number (20 bits or more) that we call app entropy.
Wallet unconditionally displays app entropy to the user (in bech32, hex etc).
User is then asked for some data to be entered that allows to improve security in case app entropy generator has a backdoor (see Dual EC DRBG). This could be a one-time text entered by the user only for this purpose, or a passphrase containing enough entropy that is used later for other purposes.
Wallet uses computationally expensive KDF to deterministically mix app entropy and user-entered text to produce the final identifier.

Important considerations:

App entropy must be displayed unconditionally so that wallet cannot know if user is going to audit it or not.
User-entered data should be visible and reproducible by the user to be able to arrive to the same data in an auditing software. Good: text or pattern of button presses. Bad: accelerometer data, mouse movements (hard to record and reproduce).
User-entered data should contain enough entropy and should be unique. If the wallet has a backdoored RNG which does not generate unique numbers, we're relying on user-entered data to avoid nonce reuse and if the wallet is trying to leak the master_seed we're relying on user entropy for secure key mixing.
Unless necessary, wallet should not rely solely on user input to generate identifier because they may inadvertently give a non-unique input. In such case, honest wallet with correct RNG implementation would with strong probability generate a unique nonce.
Process of mixing app entropy and user input must use computationally expensive KDF to compensate for non-uniqueness OR low-entropy in user input.

On second thought, it may not need to be an expensive KDF, but it may need to be a one way compression function. HMAC-SHA256 or our ChaCha20 hash. With the user input as key and app_entropy as nonce. I believe simple XOR won't provide protection from master_seed leakage when users enter less than 20-bits of entropy, which is likely when limited to 4 bech32 characters.

The worst case is the attacker predicts the user's input they can recover 20-bits of leaked master_seed. My KDF above costs $0.0002 to check all possible 20-bits with known user data. So there's no way to secure this other than that the user should be able to type more than 4 characters to maximize their chances of a unique string w/ 20-bits entropy.

from Cryptodome.Cipher import ChaCha20
from Cryptodome.Random import get_random_bytes
from segwit_addr import convertbits

# user data is limited to 32 bytes
padding _length = 32 - len(user_data)
key = user_data + bytes(padding_length)

nonce = get_random_bytes(8)
cipher = ChaCha20.new(key=key, nonce=nonce)

identifier = convertbits(cipher.encrypt(bytes(3)),8,5)[:4]

For auditing by hand, provide 8 de-biased dice rolls to select the custom identifier avoids doing a chacha20 hash just for this.

apoelstra commented 1 year ago

Heh, your comment inspired me to actually read dual-EC-DBRG. It's quite elegant, other than the obvious backdoor, and also because it's obscenely slow. My read of it is that backdoor allows an attacker who has seen 30 (aligned) bytes of output can learn the internal state of the RNG, with a little bit (16 bits) of grinding. Once they know the internal state they can predict every future random number. However they cannot learn previous states of the RNG, and in particular there is no way for them to learn the seed, assuming the discrete log problem is hard.

So for this particular usecase, dual-ec-drbg would actually be perfectly fine :P.

But regarding scrypt parameters, let's back up a bit. I spoke on the phone with @roconnor-blockstream and we briefly touched on this. One observation he made is that any considerations applied to the identifier also applies to the shares (we should assume our attacker has at least one share; arguably we should assume they have k-1 shares). So I think we should do the same thing in both cases, and in this case the 20-bit ID is the least of our problem next to the key-length-many-bits shares.

Our attack scenario is: an attacker has access to the 20-bit ID, a share, multiple shares, whatever. And they are trying to access the actual master secret. The scenarios are:

If the data was generated maliciously then the seed could just be leaked outright.
If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the data than by deriving addresses.
If the data is generated by a slow RNG/KDF, then an attacker with the data has no advantage over one without.

Having auditability forces us into the middle choice. If we don't worry about auditability then we'll be in either the "all good, no attacker advantage to having share data" case or the "all bad, the seed was totally leaked" case. And we can bias toward "all good" with the usual techniques. Open source software, independent audits, etc.

Given this, my feeling is that if somebody cares enough to audit their random data by hand, they care enough to just generate all their data by hand (which would actually be much faster than auditing, no matter what RNG we chose). And if they don't care enough, they'd be better off if we used scrypt everywhere rather than chacha everywhere.

BenWestgate commented 1 year ago

Auditing a wallet always requires a computer to verify the public keys. So hand auditable share sets are not very important considering as you say the share set could be generated by hand faster for the paranoid.

That said, it makes sense for a given codex32 secret to always produce the same shares and share index order on electronics, but it is fine to require electronics to compute the shares with an expensive KDF.

I initially used Scrypt because it was built in hashlib, but the current state-of-the-art for memory hard KDFs is Argon2id. Its main advantage is being able to set the time cost and memory cost independently, while with Scrypt the time cost depends on memory cost.

apoelstra commented 1 year ago

Agreed. I don't have much of an opinion between scrypt and argon2id. I'd be tempted to use scrypt because it's more widely supported and has been tested pretty heavily by the various cryptocurrencies using it a PoW. I'm also not sure there's a ton of value, for us, in being able to set time and memory separately. (And we could emulate that somewhat by just iterating scrypt.)

BenWestgate commented 1 year ago

If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the [share] data than by deriving addresses.

If the data is generated by a slow RNG/KDF, then an attacker with the data has no advantage over one without.

This means the KDF hardness parameters only need to cost more than deriving an address? The cheapest "address" is the master pubkey hash we suggest to use as default identifier.

Cost to beat is: HMAC-SHA512, produce a pubkey, then ripemd160(sha256(master_pubkey))

That has 1 in 2^20 chance of a false positive. (Or it's faked) In that case, a known address must be derived.

Setting time cost and memory cost an order of magnitude (or more) higher than an address derivation should be safe and able to run on HWWs

apoelstra commented 1 year ago

This means the KDF hardness parameters only need to cost more than deriving an address? The cheapest "address" is the master pubkey hash we suggest to use as default identifier.

Yeah, that's my thinking.

Setting time cost and memory cost an order of magnitude (or more) higher than an address derivation should be safe and able to run on HWWs

BenWestgate commented 1 year ago

I did some research with the chat robot and it claims HWWs are using "a few kB" of memory for address derivation from a BIP32 seed and most popular HWWs could use as much as 64-512kB for KDF memory cost parameters.

This makes memory hard algorithms on HWWs useless against GPUs: 8GB / 0.5MB > # GPU cores, but may still help somewhat against ASICs.

PBKDF2 in BIP39 is especially weak against GPUs (needs 8+ random word passphrases to be secure).

Given this, creating a secure auditable master_seed may want to use different parameters on mobile and desktop wallets (1+ GB) than hardware wallets (512kB). Here's an example from my implementation that costs 1GB and a few seconds:


def fresh_master_seed(seed_length,user_entropy,bitcoin_core_entropy):
    """Derive a fresh master seed with seed length bytes from user entropy and bitcoin core's entropy aka app entropy."""
    if 16 > seed_length > 64:
        return None
    master_seed = hashlib.scrypt(password=bytes(user_entropy+str(seed_length),"utf"), salt=bytes(bitcoin_core_entropy,"utf"), n=2**20, r=8, p=1, maxmem=1025**3, dklen=seed_length)
    return master_seed

A time_cost of 5-10 seconds may be recommended to reduce the disadvantages of weak hardware. Unlike authentication, where user must wait time_cost each login so is limited to 1-2 seconds or less, this is a one time wait at master_seed generation.

As long as the parameters were something unconditionally displayed like App Entropy anyone can verify they get the same codex32 secret by computing the KDF on a trusted device.

For share set generation, which only needs to cost more than an address derivation which I'm told takes "a few seconds or less" for a HWW then targeting something that takes 10 seconds for the worst hardware we want to support and uses 64kB or more on should be 2-10x as time costly and 10-100x as memory costly as address derivation.

Here all wallets should use the same parameters so codex32 secrets always derive identical codex32 share sets. Much like how 2048 rounds is part of bip39.

apoelstra commented 1 year ago

Yeah, "a few KB" for address generation sounds about right. Memory-wise it's extremely cheap and therefore extremely parallelizable. There's also a time-memory tradeoff. I think I could probably do it with <100 bytes of RAM but it wouldn't be very fast.

Given this, creating a secure auditable master_seed may want to use different parameters on mobile and desktop wallets (1+ GB) than hardware wallets (512kB). Here's an example from my implementation that costs 1GB and a few seconds:

Is the assumption here that the user entropy or Core entropy might be weak? I think if together they exceed 128 bits of entropy then there isn't any need for hardening at all. Or am I confused?

For share set generation, which only needs to cost more than an address derivation which I'm told takes "a few seconds or less" for a HWW then targeting something that takes 10 seconds for the worst hardware we want to support and uses 64kB or more on should be 2-10x as time costly and 10-100x as memory costly as address derivation.

Yep, sounds good to me.

BenWestgate commented 1 year ago

Is the assumption here that the user entropy or Core entropy might be weak? I think if together they exceed 128 bits of entropy then there isn't any need for hardening at all. Or am I confused?

It's to protect against RNG backdoors in "secure element" HWWs.

Neither the device or user should be trusted, so the hardening makes the most of what entropy the user is willing to provide to protect themselves. Increasing the cost tremendously vs SHA2 for the backdoor author to steal.

This is the "lite" effort version of the dice debiasing worksheet. KDF offers real protection even without achieving 128-bits user entropy, even half that is act of congress expensive:

Attack cost estimates of memory hard KDFs by entropy

65-bits would be on order of 10 billion USD with 1GB and 1 second argon2id parameters.

50-bits would be around $1 million. It's less profitable to backdoor the RNG if user can mash their keyboard for a minute or toss a dice 10 times and verifiably make this attack cost more than they'll ever store.

BenWestgate commented 1 year ago

In particular I think a 2-bit code will only be able to detect one error (I guess, two, if they're consecutive), so it would mean we could detect certain character errors but not others. This is not zero value but it's fairly limited. I think we should keep thinking whether there's a better use of 2 bits before committing to this.

From BIP-173:

...ordering is chosen to minimize the number of pairs of similar characters that differ in more than 1 bit.

The most commonly substituted characters are all 1-bit errors.

For 256-bit seeds w/ 4-bit ECC as padding, the total error correction becomes 4 substitutions plus 1 substitution if it is a 1-bit error. For 128-bit seeds w/ 2-bit ECC as padding, there's no corrections but it can narrow down a 1-bit error substitution to the upper or lower half of the code, maybe better. 512-bit seeds also gain a 1-bit substitution error correction boost.

If the same 2 or 4 bit code is added to each independent share then do all the derived shares get valid ecc padding as well?

apoelstra commented 1 year ago

It's to protect against RNG backdoors in "secure element" HWWs.

OK, yeah, this sounds good to me. Though of course, verifying this is done correctly will require redundant hardware. But better to have the ability than to not.

If the same 2 or 4 bit code is added to each independent share then do all the derived shares get valid ecc padding as well?

I am 90% sure, yes

For 256-bit seeds w/ 4-bit ECC as padding, the total error correction becomes 4 substitutions plus 1 substitution if it is a 1-bit error.

Similarly, I am 90% sure this is right ... the two codes don't directly add so that you can do some combined error correction algorithm (as far as I'm aware). But I do think that you can do a form of soft error correction where you can find all the possible strings that are 5 errors away from the target string. There will be several but only one, I think, with pretty-good probability, will have the extra parity bits set correctly.

And even if there's more than one, it's easy to manually check them all. In fact, in general you should be able to do this -- "correct" more than four errors by using error correction to reduce the search space by a factor 2^20. Then having a couple extra bits will give you a couple extra bits' worth of reduction.

BenWestgate commented 1 year ago

That was originally why I wanted a default identifier derived from master_seed, the 2^20 search reduction. But now it’s a hash160(master_pubkey), it’s only a few HMAC-SHA512s faster than using a known wpkh address to eliminate wrong corrections.

I’ll test how many candidate master_seed to > hash160(master_pubkey) can be checked per second and update this post.

It's pretty slow...

BenWestgate commented 1 year ago

from electrum import bip32
for candidate_seed in range(2**20):
    bip32.BIP32Node.from_rootseed(candidate_seed.to_bytes(16, 'big'),xtype="standard").calc_fingerprint_of_this_node()

This takes 100 seconds on i5-1135G7 laptop and calculates fingerprints for 2^20 candidate seeds.

For comparison, checking the first wpkh() address of 2**16 candidate seeds:

from electrum import bip32
from electrum import ecc
from electrum.crypto import hash_160

for candidate_seed in range(2**16):
    priv_masterkey = bip32.BIP32Node.from_rootseed(candidate_seed.to_bytes(16,'big'),xtype="standard")
    hash_160(priv_masterkey.subkey_at_private_derivation([2147483732, 2147483648, 2147483648]).subkey_at_public_derivation([0,0])[1].get_public_key_bytes(compressed=True))

Took 46 seconds, so 736 seconds to check 2**20 seeds. The master_pubkey fingerprint identifier is a useful 7.4x speed up vs known wpkh addresses.

BenWestgate commented 1 year ago

Back to the desired hardening of share payload derivation:

import hashlib
for candidate_seed in range(2**16):
    hashlib.scrypt(password=candidate_seed.to_bytes(16,'big'), salt=bytes('2test','utf'), n=2**6, r=8, p=1, maxmem=68*1024, dklen=16)

This generates the 1st share's payload for 2^16 candidate master_seed's with codex32 secret threshold = '2' and identifier = 'test' in 7 seconds.

This is 2.2x slower than checking the fingerprint (3.1 seconds) but still 7x faster than checking the first wpkh() address.

If the data is generated by a slow RNG/KDF, then an attacker with the [share] data has no advantage over one without.

The maxmem cannot be raised any higher as most HWWs have 128kiB of RAM and there is no reason to not support them. So following your advice, lets slow it down by doing 16 rounds. If you believe share payloads must be slower to derive than an address then we can use your idea:

(And we could emulate that somewhat by just iterating scrypt.)

import hashlib

for candidate_seed in range(2**14):
    intermediate_key = hashlib.scrypt(password=candidate_seed.to_bytes(16,'big'), salt=bytes('2test','utf'), n=2**6, r=8, p=1, maxmem=68*1024, dklen=16)
    for _ in range(15):
        intermediate_key = hashlib.scrypt(password=candidate_seed.to_bytes(16, 'big'), salt=intermediate_key, n=2 ** 6, r=8, p=1, maxmem=68 * 1024, dklen=16)
    intermediate_key

This generates the 1st share's payload for 2^14 candidate master_seed's with codex32 secret threshold = '2' and identifier = 'test' in 43 seconds, or ~172 seconds for 2^16 which is much slower than 46 seconds to derive 2^16 wpkh() address payloads.

I have no idea why iterating it 16 times caused a 25x+ slowdown, computers are magical.

Any problems here?

I'm still wondering why the shares need to be hardened since the search space is the 128-bit master_seed. Are we concerned that may some day not be long enough to be secure against brute force without the slowdown of address derivation?

Our attack scenario is: an attacker has access to the 20-bit ID, a share, multiple shares, whatever. And they are trying to access the actual master secret. The scenarios are:

If the data was generated correctly but with a fast RNG, then an attacker can grind out the secret far faster (by a large constant factor) by re-generating the data than by deriving addresses.

I thought it was not even possible to increment a 128-bit counter thru all values without 10 nuclear plants running at full capacity for a year (or something like that).

So lets confirm with a 4th person, the threat model is valid before we roast these poor Trezors. If the share payloads were fast to derive with ChaCha20 they'd become an almost unlimited source of error detection with each independent share acting as a cryptographic hash checksum the length of the secret. Is that more useful than making it faster for guessers to check a number that cannot be counted to?

These scrypt parameters with more iterations will work for hardening the master_seed against RNG backdoors with user supplied entropy on 128kiB RAM HWWs, as well as for deriving a unique identifier securely when re-sharing a master_seed when not all previous identifiers are known.

apoelstra commented 1 year ago

I'm still wondering why the shares need to be hardened since the search space is the 128-bit master_seed. Are we legitimately concerned that may some day not be long enough to be secure against brute force without the slowdown of address derivation?

If we really have 128 bits of good entropy then I agree that hardening has no value. But maybe the seed isn't really that strong, or some of it is leaked, or whatever ... then my thinking is that we ought to try to "not make the situation worse" than bare BIP32, and we can do that by hardening all our randomness roughly as much as an EC mult.

Now, I guess you're suggesting that we make the 128 bits itself hard, even against weak entropy, by doing a serious amount of hardening on the user's input. This will protect us from bad input but not from key leakage.

So maybe the "should we bother hardening the shares/IDs" question comes down to "should we attempt to ameliorate the damage caused by leaked seed data"? My feeling is that the hardening doesn't add much cost so we might as well.

This is contradicting this:

Yeah, sorry, my thinking has changed a bit over the course of this thread.

BenWestgate commented 1 year ago

So maybe the "should we bother hardening the shares/IDs" question comes down to "should we attempt to ameliorate the damage caused by leaked seed data"? My feeling is that the hardening doesn't add much cost so we might as well.

Makes sense. This is like medicine "First, do no harm."

So then do we need to harden the master_pubkey fingerprint default identifier? Otherwise it's a 7.4x speed up with a 1 in a million chance of being a false positive when grinding out a partially leaked master_seed?

Or is that not important because the attacker could get the fingerprint from the watch-only wallet?

apoelstra commented 1 year ago

If we are actually using the BIP32 fingerprint then we already have enough hardening, since the bip32 fingerprint requires an EC mult to calculate it from the seed :).

If we are using some alternate notion of "fingerprint" then yes, we should make sure that it's at least as hard as an ecmult.

BenWestgate commented 1 year ago

If we are actually using the BIP32 fingerprint then we already have enough hardening, since the bip32 fingerprint requires an EC mult to calculate it from the seed :).

Yes, that is the fingerprint we have in mind.

True about the EC mult, but the fingerprint is hash_160(master_pubkey) while a wpkh() address is something like hash_160 of [BIP32_fingerprint/84'/0'/0']xpub.../0/0 so there are 5 additional HMAC-SHA512 child derivations to slow it down, assuming BIP84 standard derivation path used. That slowdown is 7 fold as I calculated earlier. Unless the slowdown thanks to HMAC-SHA512 can be ignored due to ASICs or something. We also don't specify implementations use a particular derivation path. It may only be 1 HMAC-SHA512 if they use m/*.

A cheeky fix would be HMAC-SHA512-ing the master_pubkey fingerprint 5 times to set the identifierto make it identical work to check as a known address. But this unfortunately breaks the ability to check the fingerprint matches the share identifier by converting from hex to bech32 in your head.

My feeling is since descriptors felt it safe to include 4 bytes of hash_160(master_pubkey) even in encrypted wallets, it should be safe to include 20-bits in our shares which are much less likely to be stolen than an online descriptor. An attacker has to consider the identifier may just be random noise as well since it's only a default.

apoelstra commented 1 year ago

My feeling is since descriptors felt it safe to include 4 bytes of hash_160(master_pubkey) even in encrypted wallets, it should be safe to include 20-bits in our shares which are much less likely to be stolen than an online descriptor.

Agreed on all counts.

I would further say that, while BIP49-style keys (which I guess is what 99.9% of all keys are..) do 7 derivations, in principle, people could be using their master keys to generate addresses. Or m/* or something.

A single constant-time EC mult is something like 100x the cost of an HMAC (500ns vs 50us). I think we should try to match the 100x factor, but not bother with the extra 7x factor since we're getting into increasingly tenuous justifications for it..

BenWestgate commented 1 year ago

The electrum bip32 library I used must be unrepresentatively slow at the HMACs, it should have been a few percent slower, but not 7x. The chat robot agrees 80%+ of the time is spent going from private to public.

One scrypt round n=2**6, r=8, p=1 would be twice as slow on my hardware, but I don't trust the numbers. Part because this library gave bad data.

But in general, there's no particular scrypt iterations we can say for sure will be more expensive than the EC mult on all hardware and software that may ever exist to attackers. Without grievous overestimation. But what if we did the exact same work, plus some?

This might sound dumb, but why not use public key hash for the share payloads?

Consider this is the time to beat,

import bitcoin.bip32 as bip32
import hashlib

def compute_bip32_fingerprint(master_seed):
    # Step 1: Calculate the extended private masterkey (m) from the master seed using HMAC-SHA512.
    m = bip32.HMAC_SHA512(b"Bitcoin seed", master_seed)

    # Step 2: Compute the extended public masterkey (M) from the extended private masterkey (m).
    M = bip32.privkey_to_pubkey(m)

    # Step 3: Take the first 4 bytes of the hash160 of the serialized extended public masterkey (M).
    M_serialized = M.serialize()
    fingerprint = hashlib.new('ripemd160', hashlib.sha256(M_serialized).digest()).digest()[:4]

    return fingerprint

then,

def compute_share_payload(master_seed, threshold, identifier, share_index):
    # Step 1: Calculate the extended private masterkey (m) from the master seed using HMAC-SHA512.
    m = bip32.HMAC_SHA512(bytes(threshold+identifier+share_index, 'utf'), master_seed)

    # Step 2: Compute the extended public masterkey (M) from the extended private masterkey (m).
    M = bip32.privkey_to_pubkey(m)

    # Step 3: Take the first 16 bytes of the hash160 of the serialized extended public masterkey (M).
    M_serialized = M.serialize()
    share_payload = hashlib.new('ripemd160', hashlib.sha256(M_serialized).digest()).digest()[:16]

    return share_payload

This is guaranteed to not be cheaper, and it's also not any slower than it needs to be. It's also very easy for wallet developers to implement, they nearly already have (perhaps too close and they forget to remove b"Bitcoin seed"...). But since it's a 128-bit match rather than 20-bits or 4 bytes, it seems there should be one more step inserted somewhere to slow it down. Open to suggestions, if you agree.

It also needs to support 16-64 byte outputs. So something like HMAC-SHA512, scrypt, pbkdf2 could work to stretch and slow it. The master_seed should be mixed back in to preserve the original entropy. The target cost to exceed is an address with 5 extra derivations.

apoelstra commented 1 year ago

This might sound dumb, but why not use public key hash for the share payloads?

I think this is too much of a burden on implementors. Though I guess if they support BIP32 they need to support this anyway. But my feeling is, demanding scrypt is fine because it's a standard function in most any general-purpose crypto library. But support for constant-time multiplication on a specific elliptic curve is a bit more of a demand, especially since you do want it to be efficient.

EC mult is also way easier for implementors to screw up and introduce sidechannels (though again, the same risk exists for BIP32 derivations, so maybe this isn't an additional problem).

But in general, there's no particular scrypt iterations we can say for sure will be more expensive than the EC mult on all hardware and software that may ever exist to attackers.

~Sure, but I don't think we really need to. Let's just assume a 100x differential.~

I apologize, I think I've been barking up the wrong tree here. On my system hmac takes 1.35us and a single ecmult gen takes 25.3us, so there's a ~20x differential, which is already less than I expected. But this is not the right comparison. If an attacker is grinding through different EC mults then their marginal operation will be an EC add (0.35us) or double (0.15us), both of which are an order of magnitude faster than an HMAC. (You might think, if the attacker needs to hash-then-ec-mult, all the ecmults will be with respect to different random values, so he has to do separate ecmults for each. Not so. Algorithms like pippinger will let him reuse computations across all of them. Or maybe you assume the attacker has to do 7 ecmults in series; same story, but with a 7x slowdown.) So against a naive software-only attacker, just using a sha256-hmac already provides more protection than the EC mult.

Now, you might assume that the attacker has a sha2 ASIC (or an scrypt ASIC, or whatever) and can do these HMACs a million times faster, but does not have an EC ASIC. Or you might assume the opposite. And I agree that "just do both operations", which is what "copy the BIP32 derivation logic" gets us, will protect us no matter what.

But I think we're way overthinking this. My new vote is that we should just use sha256-hmac for these derivations, on the basis that it's easy to implement and guaranteed to be available with any crypto library, even one that has no awareness of bitcoin, and will be constant-time even if the implementor wasn't thinking about this. We could use sha512-hmac if you want to imitate BIP32, but I don't think it matters. You could iterate it 100x if you really want to cover the case where public addresses only occur after sereval derivation steps, but I don't think it matters.

BenWestgate commented 1 year ago

we should just use sha256-hmac for these derivations For simple hmac, you'd want HMAC-SHA512 because we need to make 512-bit share payloads for the 512-bit master_seed case. It's also slightly slower.

But I believe PBKDF2_hmac is also in every standard crypto library and it can implement the iteration for us, it also has a dklen parameter that somehow produces 64-bytes even with iterations=1 and hash_name='sha256'.

hashlib.pbkdf2_hmac()

import hashlib
share_payload = hashlib.pbkdf2_hmac('sha256', master_seed, salt=b'ms2testa', 100, dklen=len(master_seed)

Example deriving payload of independent share "a" from a codex32 secret starting with ms12test.

If this is sufficient, I think we've covered every implementation detail except the shuffling algorithm. (and the ECC code padding, since we haven't thought of a better use for 2 to 4 bits yet)

We were still hanging on to hand auditability back then, so now we want something that can be seeded by the pbkdf2_hmac() above but with salt=b"ms1tests" for the codex32 secret (since we don't know which share comes first yet) and always produce the same share index shuffle across many popular crypto libraries.

The robot tells me:

One such algorithm is the Fisher-Yates shuffle (also known as the Knuth shuffle), which is widely used and known for its simplicity and efficiency.

...[Knuth shuffle] is often the default shuffling algorithm used in many standard crypto libraries and programming languages.

I haven't experimented with this to have a code snippet yet. I've only ever used random.sample() and shuf in command-line although perhaps I'm lucky and random.sample() IS this algorithm.

apoelstra commented 1 year ago

If this is sufficient, I think we've covered every implementation detail except the shuffling algorithm. (and the ECC code padding, since we haven't thought of a better use for 2 to 4 bits yet)

Nice :) yeah, I think this is fine.

If I had to guess the "Knuth shuffle" is my "Algorithm P", which comes from Knuth. So I guess we should use that. Like all other algorithms we considered, this involves an indefinite amount of randomness, but if we're not worried about hand-auditability, I think we could just pick any arbitrary way of stretching our kdf output (say, sha2'ing it repeatedly, or sha512'ing repeatedly, whatever we feel is more consistent with the rest of everything).

BenWestgate commented 1 year ago

Nice :) yeah, I think this is fine.

We can note pbkdf2 was chosen for simplicity over scrypt which needed iterating due to HWW memory constraints to cost more than address derivation and over argon2id due to library availability.

From the 13 year old, NIST SP 800-132 (being revised this summer):

A minimum iteration count of 1,000 is recommended.

for candidate_seed in range(2**15):
    hashlib.pbkdf2_hmac('sha256',candidate_seed.to_bytes(16,'big'),b'2testa', 1000, dklen=16)

This checks 2^15 candidate_seeds in 10 seconds. That's still twice as fast as checking an address in electrum's library. 21 seconds if hash_name='sha-512' still a little faster than addresses in this library.

From the competing BIP39:

To create a binary seed from the mnemonic, we use the PBKDF2 function with a mnemonic sentence (in UTF-8 NFKD) used as the password and the string "mnemonic" + passphrase (again in UTF-8 NFKD) used as the salt. The iteration count is set to 2048 and HMAC-SHA512 is used as the pseudo-random function. The length of the derived key is 512 bits (= 64 bytes).

We wouldn't want codex32 shares to be weaker against leaked seed data than the competition.

for candidate_seed in range(2**15):
    hashlib.pbkdf2_hmac('sha512',candidate_seed.to_bytes(16,'big'),b'2testa', 2048, dklen=16)

This runs in 44 seconds, about twice as long as checking 2^15 addresses from candidate seeds. With a coincidence like this, it's worth copying BIP39. Especially if as you say the EC mult work is reusable in an optimized attack making hmac the main slowdown.

If I had to guess...

The Fisher-Yates shuffle algorithm, which is also known as the Knuth shuffle, is often referred to as "Algorithm P".

Both random.sample() and shuf use it by default. But my initial test failed to reproduce the same sort from same seed:

head -c16 /dev/urandom > random.seed
string=qpzry9x8gf2tvdw0s3jn54khce6mua7l
for ((i=0; i<${#string}; i++)); do
    echo ${string:i:1}
done | shuf --random-source=random.seed

produced a different order than:

import random

CHARSET = "qpzry9x8gf2tvdw0s3jn54khce6mua7l"

with open('random.seed', 'rb') as file:
    index_seed = file.read()
random.seed(index_seed)
random.sample(CHARSET,32)

So we may need to explicitly write the shuffle implementation unless we can find some reproducible shuffles between multiple different libraries and tools. Let me know if you can make two different libraries produce the same shuffle from the same seed.

If we have to write it, the most promising method is:

chacha20_key = hashlib.pbkdf2_hmac('sha512',master_seed, b'2tests', 2048, dklen=32)

Use this as the key stream for "Algorithm P", now the chacha20 nonce / counter can be set to 0 with bytes(8) since the hmac did the header mixing and we have indefinite RNG that's as hardened as the share payloads.

say, sha512'ing it repeatedly

With hmac-sha512 with master_seed as key and message as a counter? So it's a slow stream cipher? Like this? Incrementing counter each time more randomness is needed for "Algorithm P"?

def generate_key_stream(key, counter):

    hash_obj = hmac.new(key, counter, hashlib.sha512)
    return hash_obj.digest()

Using HMAC-SHA512 with the message as counter can be a valid method for generating a key stream, especially for shorter key streams.

However, ChaCha20 has several advantages: It is optimized for generating large amounts of random data, extensively used in protocols like TLS, and has strong security properties due to thorough analysis.

If the goal is to generate a long key stream efficiently, ChaCha20 is a better choice. But if a relatively short key stream is sufficient and you prefer consistency with other HMAC-based operations, using HMAC-SHA512 with the counter as the message is a viable option.

I understand this function is safe, if this isn't, neither is BIP32 with it's indices.

Is it preferred to use hmac for consistency or more extensively-audited ChaCha20 designed specifically for key stream generation?

BenWestgate commented 1 year ago

I've written the cryptographic shuffle, I don't know what the algorithm is called, but it's correct.

I use pbkdf2 to create index_seed the same way as share payloads.

Then hmac-sha512(index_seed,counter//32) and assign 2 bytes to each of the 31 characters. If it finds a repeat, it increments the counter, if that happens more than once, it has to move to the next hmac block at message=1 and handles that case correctly. There's about a 1% chance of colliding 31, 2 byte tags so 99.99% of runs it'd only need the first hmac-sha512.

import hashlib
import hmac

# note: "s" has been removed
CHARSET = "qpzry9x8gf2tvdw03jn54khce6mua7l"

# hmac-sha512 keystream, 16-bits per count
def rand_func(index_seed, counter):
    digest = hmac.new(index_seed, (counter // 32).to_bytes(8,"big"), hashlib.sha512).digest()

    return digest[counter%32*2:counter%32*2+2]

# Function to assign 16-bit values to characters
def assign_2_bytes_to_characters():
    counter = 0
    assigned_values = {}
    for char in CHARSET:
        while True:
            value = rand_func(index_seed, counter)
            counter += 1
            if value not in assigned_values.values():
                assigned_values[char] = value
                break
    return assigned_values

# Example usage:
if __name__ == "__main__":
    master_seed = bytes(16)
    index_seed = hashlib.pbkdf2_hmac('sha512', master_seed, b'ms2tests', 2048, dklen=len(master_seed))

    # Assign 32-bit values to characters
    character_values = assign_2_bytes_to_characters()

    # Sort the characters based on their assigned values
    sorted_characters = sorted(character_values.keys(), key=lambda x: character_values[x])

    # Print the sorted characters and their assigned values
    for char in sorted_characters:
        print(f"{char}: {character_values[char]}")

Code is slow because it hmacs for every 2 byte tag instead of only as necessary.

I let it grind for a while and it correctly handled an event where the counter hit 33 (3 tag collisions)

index_seed = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00K\xc5\x88'
hmac(index_seed,0) = b"\xa0\xd8t\xdbm\xd7\x11\xf0\x8dCB\x0c\x8e\\\xf4\xdcft\x1a\x94\xff\x02T\t\x13#y\xc2r\xcc\xf0\x0f\xa0\xe5\x11\xf0$\xb1I\x1bq\xbe\xe8}m\xd7\x02\xb7\x1aq\xb5\x9a\x87oI\x1b\\v\xeb9\xef\xa4'\x84"
hmac(index_seed,1) = b'B\x1eq\xd5y\xa2a\xa1\xff7\x10\xbd*\xf6\xf0\xa0*f\x9b\xe3\x0bEl\xcf=N6\xf5\x9f\x08\x12\xe5J[\xb9\xda\x92\x04\x89\x10\xec\x98zc<m\xa5i\xe4\x18\xabP_C\'\xce\x1b\xb2?\xc1"m\\\x8d'
{'q': b'\xa0\xd8', 'p': b't\xdb', 'z': b'm\xd7', 'r': b'\x11\xf0', 'y': b'\x8dC', '9': b'B\x0c', 'x': b'\x8e\\', '8': b'\xf4\xdc', 'g': b'ft', 'f': b'\x1a\x94', '2': b'\xff\x02', 't': b'T\t', 'v': b'\x13#', 'd': b'y\xc2', 'w': b'r\xcc', '0': b'\xf0\x0f', '3': b'\xa0\xe5', 'j': b'$\xb1', 'n': b'I\x1b', '5': b'q\xbe', '4': b'\xe8}', 'k': b'\x02\xb7', 'h': b'\x1aq', 'c': b'\xb5\x9a', 'e': b'\x87o', '6': b'\\v', 'm': b'\xeb9', 'u': b'\xef\xa4', 'a': b"'\x84", '7': b'B\x1e', 'l': b'q\xd5'}

And you can see the last two character tags before sorting came from the first 4 bytes of the second hmac. Some of the colliding tags were b'\x11xf0', b'm\xd7', I\x1b.

chacha20 could drop in for hmac-sha512 here, and would have been a bit easier to work with as the libraries for it usually have a call "give me keystream byte N" rather than having to carefully subscript.

And from the previous post for share payloads: hashlib.pbkdf2_hmac('sha512',candidate_seed.to_bytes(16,'big'),b'2testa', 2048, dklen=16) It would be better to do dklen=derived_shares*len(master_seed) and split the derived key rather than pbkdf each derived share's index independently.

Finally, the pbkdf2 for indexes could be skipped, keeping it to one hardening operation per codex32 secret > codex32 share set by deriving 16 extra bytes in dklen= to use as the index_seed when fetching the share payloads. 16-bytes is sufficient to produce every permutation.

apoelstra commented 1 year ago

Interesting approach. You probably need to call the HMAC only once (expected number of RNG calls is 31.007 and you get 32 values from each HMAC). Let me take a bit of time to get the probability you need to call it 3 times, which I suspect will be an extremely low number, and you can rewrite your code to simply always call it twice, and cache the values.

BenWestgate commented 1 year ago

...and cache the values.

Here I re-wrote it to cache the digest if not counter % 32. It's cleaner to read this way also.

import hashlib
import hmac

# note: "s" has been removed
CHARSET = "qpzry9x8gf2tvdw03jn54khce6mua7l"

# Function to assign 16-bit values to characters
def assign_2_bytes_to_characters():
    counter = 0
    digest = b''
    value = b''
    assigned_values = {}
    for char in CHARSET:
        while value in assigned_values.values() or not value:
            if not counter % 32: # hmac-sha512 keystream
                digest = hmac.new(index_seed, (counter // 32).to_bytes(8, "big"), hashlib.sha512).digest()
            value = digest[counter%32*2:counter%32*2+2] # 16-bits per count
            counter += 1
        assigned_values[char] = value
    return assigned_values

# Example usage:
if __name__ == "__main__":
    master_seed = bytes(16)
    # index_seed = hashlib.pbkdf2_hmac('sha512', master_seed, b'ms2tests', 2048, dklen=len(master_seed))
    index_seed = b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00K\xc5\x88' # a seed that needs 2 hmacs counter>32
    # Assign 32-bit values to characters
    character_values = assign_2_bytes_to_characters()

    # Sort the characters based on their assigned values
    sorted_characters = sorted(character_values.keys(), key=lambda x: character_values[x])

    # Print the sorted characters and their assigned values
    for char in sorted_characters:
        print(f"{char}: {character_values[char]}")

Side note, the only reason I chose 16-bit tags to sort was it was 512 // 31, other than that it was completely arbitrary. Unsure if 1-byte tags or 5-bit tags would be faster by consuming less randomness.

you can rewrite your code to simply always call it twice, ...

There's no point in calling HMAC any fixed number of times, if limited randomness is sufficient, then get it directly from pbkdf2_hmac.

derived_key = hashlib.pbkdf2_hmac('sha512', master_seed, b'2tests', 2048, dklen=128+len(master_seed)*(k-1))
for i in range(k-1):
    share_payload[i] = derived_key[i*len(master_seed):(i+1)*len(master_seed)]
share_index_randomness = dervied_key[(k-1)*len(master_seed):] # 128 bytes of randomness

It only saves one line of code to do it the wrong but extremely often correct way. ChaCha20 would save the same line because it auto-increments its internal counter to stream from new 512-bit blocks without implementer having to write or even be aware of it happening.

apoelstra commented 1 year ago

Unsure if 1-byte tags or 5-bit tags would be faster by consuming less randomness.

In terms of expectation it seems like 7 bits is the minimum (needs 247.4 bits from the RNG, on average), but 8 is really close (263.8), more convenient to implement, and also it has a tighter distribution (much lower probability of needing much more extra randomness than the expected value.)

In particular I think, based on explicitly computing out probabilities for needing up to 38 samples then extrapolating:

With the 16-bit sampler we'll need more than 64 samples (1024 bits) with probability less than 2^300
With the 8-bit one we'll need more than 64 samples (512 bits) with probability less than 2^40
With the 7-bit one we'll need more than 74 samples (518 bits) with probability less than 2^20

There's no point in calling HMAC any fixed number of times, if limited randomness is sufficient, then get it directly from pbkdf2_hmac.

Heh, nice. I think though that we ought to reduce our sampler to an 8-bit one, and in that case we arguably can't get away with a fixed amount of randomness. I also prefer to minimize the use of the KDF in the spec, which feels opaque to me in a way that sha2-hmac or chacha don't. Ideally, if somebody had a strong justification and wanted to generate the raw seed data without the KDF, or with reduced rounds, or something, they'd still be able and willing to follow the rest of the spec.

BenWestgate commented 1 year ago

Ideally, if somebody had a strong justification and wanted to generate the raw seed data without the KDF ... or something, they'd still be able and willing to follow the rest of the spec.

This is not currently possible because we used PBKDF2(master_seed, header) to get the randomness that sets share payloads and the stream cipher key for index shuffle. The other option was advantaging someone who has part of your seed (ie 1.3 shares, threshold 2), because the share payloads could be checked faster than an address or fingerprint. Or copying BIP32 but that was called hard to implement correctly.

..., or with reduced rounds, ...

If they reduced the rounds, they'd get a different set of shares and indexes so I'm not seeing the value.

A "Codex32 Secret" is the input to the deterministic share split. That can be made however is secure. But then it is hardened before assigning indexes or share payloads.

Whole header (perhaps excluding '1') or whole string should be used in case the hrp ever gets used for something else or checksum changes, then an identifier and threshold collision won't re-use randomness. The share index MUST be in the header to support sub-sharing in the future. I like password=codex32_secret, salt=master_seed as the kdf parameters. Since the codex32_secret may contain user_entropy, while the salt is random bytes common to all codex32 secrets with that payload.

But we can rely less on the dk_len= parameter which is sketchy and probably reduces security when set greater than the hash_name 's length. Produce a derived key of default length, which is the length of the hash_name, then that key= can be Hmac'd with the message=b'Index seed' with this output fed to chacha20 or the custom iterated hmac construction, and the payloads for the independent shares hmac with message=b'Share data 0', message=b'Share data 1' ... etc.

I have a feeling share data should be produced with a stream cipher too. Not because an unlimited amount is needed, but because the amount needed ranges from tiny 128-bit to 4096-bits (512-bits 8) and it's cleaner to ask for the required bytes from a stream cipher than make implementers remember to iterate their hash enough times for all `threshold-1 len(master_seed)` possibilities.

Either this or getting them directly w/ dk_len= from pbkdf2 is very easy. Although I have no idea how it makes 512 bytes from a 64 byte hash. "concatenating earlier results" is what the robot said, which makes the effective iterations less (or more?). While the dk_len = hash_name length pbkdf2 case is straightforward and easy to implement even without a library for pbkdf2.

7 bits is the minimum (needs 247.4 bits from the RNG, on average), but 8 is really close (263.8)

This suggests another way to optimize the cipher stream for shuffling, if not using ChaCha20, is hmac-sha256, sometimes it'll only need 1 hash, and the hash is more than twice as fast.

Speaking of which, if grabbed directly from pbkdf2's dklen=, how many bytes should index_seed be? The key to the stream cipher, it must be 256-bits if we switch to ChaCha. I think the total number of possible sorts is 2^87, Sha2 probably has some collisions so you may not get every sort with 11-bytes, 16-bytes is a round number close by and as secure as the seed itself.

Or if doing index_seed = hmac(key=kdf_derived_key, message=b'Index Seed') use the entire hmac output.

The key= for the share payload stream cipher should support up to 64-bytes, which rules out ChaCha20 here, because otherwise the shares produced have less entropy than the master_seed when they didn't need to.

Heh, nice. I think though that we ought to reduce our sampler to an 8-bit one, and in that case we arguably can't get away with a fixed amount of randomness. I also prefer to minimize the use of the KDF in the spec, which feels opaque to me in a way that sha2-hmac or chacha don't.

apoelstra commented 1 year ago

This is not currently possible because we used PBKDF2(master_seed, header) to get the randomness that sets share payloads and the stream cipher key for index shuffle. The other option was advantaging someone who has part of your seed (ie 1.3 shares, threshold 2), because the share payloads could be checked faster than an address or fingerprint. Or copying BIP32 but that was called hard to implement correctly.

I no longer believe this is an advantage. I think we could just use an HMAC, or maybe an HMAC iterated a dozen times, and then checking the 20 bits would be (basically) just as slow as computing an entire address.

If we were to use an HMAC here, then the PBKDF would only be used to produce the actual seed data, and this doesn't really need to be deterministic, the user just needs to somehow know that they got enough entropy into it.

I have a feeling share data should be produced with a stream cipher too. Not because an unlimited amount is needed, but because the amount needed ranges from tiny 128-bit to 4096-bits (512-bits 8) and it's cleaner to ask for the required bytes from a stream cipher than make implementers remember to iterate their hash enough times for all `threshold-1 len(master_seed)` possibilities.

I think we should use an HMAC keyed on the share index, since we always need a fixed amount of data per index, and using an HMAC gets us close to the "as expensive as doing a BIP32 derivation" ideal. If we were to use chacha20, it would be faster for somebody to grind through secrets trying to get the share, than it would be to grind through them trying to get an address.

BenWestgate commented 1 year ago

I no longer believe this is an advantage. I think we could just use an HMAC, or maybe an HMAC iterated a dozen times, and then checking the 20 bits would be (basically) just as slow as computing an entire address.

Isn't this extremely close or identical to derived_key = pbkdf2('sha512', header, master_seed, 12)?

Also I see pbkdf2 pre-hashes passwords that are longer than the hash used, so only the header b'ms1cashs' should be used as password (or message if raw HMAC) otherwise 32+ byte codex32_secret's have an extra step. Likewise, it means using 'sha512' to support the 64-byte master_seed option without entropy reducing pre-processing.

If we were to use an HMAC here, then the PBKDF would only be used to produce the actual seed data, and this doesn't really need to be deterministic, the user just needs to somehow know that they got enough entropy into it.

Yeah that would be nice. Are you sure the pbkdf2 snippet above is not identical to dozen iterations of hmac-sha512 you have in mind? Can you share a code snippet for how you're thinking of processing the codex32_secret?

I have a feeling share data should be produced with a stream cipher too. Not because an unlimited amount is needed, but because the amount needed ranges from tiny 128-bit to 4096-bits (512-bits 8) and it's cleaner to ask for the required bytes from a stream cipher than make implementers remember to iterate their hash enough times for all `threshold-1 len(master_seed)` possibilities.

I think we should use an HMAC keyed on the share index, since we always need a fixed amount of data per index, and using an HMAC gets us close to the "as expensive as doing a BIP32 derivation" ideal. If we were to use chacha20, it would be faster for somebody to grind through secrets trying to get the share, than it would be to grind through them trying to get an address.

So HMAC-SHA512(key=derived_key, message=b'ms1cashq')[:len(master_seed) or HMAC-SHA512(key=master_seed, message=b'ms1cashq')[:len(master_seed)?

Truncating hmac-sha512 to the seed length to set dependent payloads.

Even though this is extra hmacs vs making a stream cipher from hmac.

apoelstra commented 1 year ago

Isn't this extremely close or identical to derived_key = pbkdf2('sha512', header, master_seed, 12)?

Oh, so it is :). For some reason I thought that PBKDF2 was much more complicated, but Wikipedia says that it's more-or-less just iterating the rng function.

I think it's fine then to just use PBKDF2 everywhere as you suggest.

So HMAC-SHA512(key=derived_key, message=b'ms1cashq')[:len(master_seed) or HMAC-SHA512(key=master_seed, message=b'ms1cashq')[:len(master_seed)?

I think, the latter. Are we still using derived_key elsewhere? If so, then we could use it here ... but if not, we could save the implementor the trouble of computing it.

Truncating hmac-sha512 to the seed length to set dependent payloads.

Even though this is extra hmacs vs making a stream cipher from hmac.

Yep.

BenWestgate commented 1 year ago

I like the symmetry where the password or message parameter is always the header including the share index.

That way index "s" is used to seed the hmac stream and the other indexes are truncated and used directly as share data.

apoelstra commented 1 year ago

Ah, yeah, I like that too.

BenWestgate commented 1 year ago

Oh, so it is :). For some reason I thought that PBKDF2 was much more complicated, but Wikipedia says that it's more-or-less just iterating the rng function.

There might be slight differences between what is passed each round. Pbkdf2 uses the output of the last hash as the salt for the next hash, password stays constant repeated iteration times.

I think it's fine then to just use PBKDF2 everywhere as you suggest.

So HMAC-SHA512(key=derived_key, message=b'ms1cashq')[:len(master_seed) or HMAC-SHA512(key=master_seed, message=b'ms1cashq')[:len(master_seed)?

I think, the latter. Are we still using derived_key elsewhere? If so, then we could use it here ... but if not, we could save the implementor the trouble of computing it.

The point of making a derived_key is to not repeat hardening on the same or nearly the same input to produce multiple output keys. From master_seed you'd need a secure amount of iterations for 'ms1cashs' for the index_seed, again for k-1 independent shares like ms1casha, ms1cashb.

Instead the iterations could just be tripled for the same creation time and triple the security.

Since the attacker is going to be checking against known indexes or a known share that's only 1-2 pbkdf2 to do not k (one for index shuffle, k-1 for dependent shares).

derived_key = pbkdf2('sha-512', password=b'ms1cashs', salt=master_seed,iterations=k*(iter-1)) this creates a set as fast as separate pbkdf2's for each payload but is harder to attack as all work must be done to check if a payload matches, the attacker can't skip pbkdf2 rounds by checking Only one share's payload or worst case (1 pbkdf2), only the order of the first 8 indexes order (if they somehow learned these).

I do like including the share index in the message or password, it requires the attacker to compute 31 shares to check a payload or to do the shuffle first.

Truncating hmac-sha512 to the seed length to set dependent payloads.

Even though this is extra hmacs vs making a stream cipher from hmac.

Yep.

Sounds good.

apoelstra commented 1 year ago

The point of making a derived_key is to not repeat hardening on the same or nearly the same input to produce multiple output keys. From master_seed you'd need a secure amount of iterations for 'ms1cashs' for the index_seed, again for k-1 independent shares like ms1casha, ms1cashb.

Instead the iterations could just be tripled for the same creation time and triple the security.

Ah, yeah, these are good points. Though as a point of terminology I think we should refer to the input to the pbkdf as entropy or something, and reserve the term "master seed" for the s share, since that's what actually gets used as the "master seed" in BIP32.

So agreed, let's make a derived_key, which I guess we should define as some xprv that users are super unlikely to derive for other reasons?

BenWestgate commented 1 year ago

we should refer to the input to the pbkdf as entropy or something, and reserve the term "master seed" for the s share

When using the more efficient and secure derived_key approach,entropy input to the pbkdf2 is always header of the codex32_secret ("ms1idens") and master_seed the same as "S" from bip32:

From BIP32:

Generate a seed byte sequence S of a chosen length (between 128 and 512 bits; 256 bits is advised) from a (P)RNG. Calculate I = HMAC-SHA512(Key = "Bitcoin seed", Data = S)

As long as you don't change the codex32 header to be "Bitcoin seed" instead of "ms1seeds" and set iterations=1, derived_key will never be their extended private masterkey / xprv.

From HMAC:

K' = H(K) if K is larger than block size K' = K otherwise K' is a block-sized key derived from the secret key, K; either by padding to the right with 0s up to the block size, or by hashing down to less than or equal to the block size first and then padding to the right with zeros.

It is counter-intuitive that BIP32 uses the secret master_seed as 'data' and a public constant as 'key'. My suspicion why is so that it handles any arbitrary length of S, because K gets hashed or padded. It would be wrong if 0xdeadbeef produced the same master xprv as 0xdeadbeef000000. The hardened children also use the "public" data as Key= and the private key data as Data=. Maybe when the hmac hash output is a secret, it's better to reverse them?

If so (hardened child): let I = HMAC-SHA512(Key = cpar, Data = 0x00 || ser256(kpar) || ser32(i)).

Speaking of which, we need to define which pbkdf2('sha512', password=?, salt=?) parameter is master_seed and which is header:

Using HMAC to derive keys from our KDF "master" derived_key may be non-standard:

And then for the index_seed, and share_payload() whether derived_key and header (including index) is the key or the message. Typically HMACs are sent with the data they authenticate, and key is secret, so that would mean data=header key=derived_key.

The correct primitive to use to derive share payloads from key-ring quality material appears to be HKDF: https://en.wikipedia.org/wiki/HKDF

It supports a salt, context data and variable length and HKDF is standardized, avoiding custom stuff again.

The extra context field isn't needed for shares but could protect implementations from developer typos. ie context="Share data 1"

And lets the keystream generator use the derived_key directly without chance of rolling over the ms1seeds "s" character and reusing a share's entropy in a cataclysmic shuffling event.

apoelstra commented 1 year ago

When using the more efficient and secure derived_key approach,entropy input to the pbkdf2 is always header of the codex32_secret ("ms1idens") and master_seed the same as "S" from bip32:

I'm a little lost. This sounds fine to me but then how is master_seed derived? I the s share was the output of the PBKDF.

Speaking of which, we need to define which pbkdf2('sha512', password=?, salt=?) parameter is master_seed and which is header:

I think we should do the "obvious" thing and say password is master_seed and salt is the header. BIP32 makes confusing choices ... your theory that they did this to avoid hashing the secret sounds plausible to me. The chaincodes in BIP32 come from a similar "entropy accounting" reasoning. I don't think this reasoning is useful, though it's certainly not harmful, other than causing some confusion. Pieter has suggested to me (and, I think, in public posts on stackexchange and elsewhere) on a number of occasions that if he were designing BIP32 today, he wouldn't have bothered with this stuff.

The correct primitive to use to derive share payloads from key-ring quality material appears to be HKDF:

Great find! This sounds perfect.

BenWestgate commented 1 year ago

As for pbkdf2 parameters: From RFC 2898:

The salt can be viewed as an index into a large set of keys derived from the password, and need not be kept secret.

This suggests salt should include the identifier.

From RFC 6234 PKCS #5: Password-Based Cryptography Specification: Version 2.1

For instance, one might derive a set of keys with a single application of a key derivation function, rather than derive each key with a separate application of the function. The keys in the set would be obtained as substrings of the output of the key derivation function. This approach might be employed as part of key establishment in a session-oriented protocol.

_This shows my original approach of taking substrings of the pbkdf2 derived_key is a standardized way to use KDF outputs._

If there are concerns about interactions between multiple uses of the same key:

the salt should contain data that explicitly distinguishes between different operations and different key lengths, in addition to a random part that is at least eight octets long, and this data should be checked or regenerated by the party receiving the salt.

I'm unsure if "salt" or "password" get passed as the "Data/Message" to the first HMAC in PBKDF2, but if my explanation why bip32 uses Data=S is correct, it seems better:

Don't pass master_seed as the Key so that it does not get padded or hashed
- Adding q's to the payload could produce the same derived_key
- Gets reused on subsequent re-shares with new headers.

My guess is password = message in PBKDF2 because passwords get reused while salts are supposed to be unique per password hash, see "viewed as an index" above.

And applicable for us, since this has to be deterministic:

Note: If a random number generator or pseudorandom generator is not available, a deterministic alternative for generating the salt (or the random part of it) is to apply a password-based key derivation function to the password and the message M to be processed. For instance, the salt could be computed with a key derivation function as S = KDF (P, M). This approach is not recommended if the message M is known to belong to a small message space (e.g., "Yes" or "No"), however, since then there will only be a small number of possible salts.

If we won't take substrings of PBKDF2's output, so that it's a simpler implementation with hash_name='sha-512', dk_len = 64, then HKDF is the way to expand that keyring material into multiple keys. But again, this is probably just recreating what pbkf2 does when dk_len > hash_len.

From RFC 5869 for HKDF:

_We only need the "expand" stage to turn derived_key into the payloads_

In some applications, the input may already be a good pseudorandom key; in these cases, the "extract" stage is not necessary, and the "expand" part can be used alone.

The second stage "expands" the pseudorandom key to the desired length; the number and lengths of the output keys depend on the specific cryptographic algorithms for which the keys are needed.

def hkdf_expand(input_key_material: bytes, info: bytes, length: int) -> bytes:
    temp = b""
    output_key_material = b""
    i = 0
    while len(output_key_material) < length:
        i += 1
        temp = hmac_sha256(prk, temp + info + bytes([i]))
        output_key_material += temp
    return output_key_material[:length]

3.2. The 'info' Input to HKDF

While the 'info' value is optional in the definition of HKDF, it is often of great importance in applications. Its main objective is to bind the derived key material to application- and context-specific information. For example, 'info' may contain a protocol number, algorithm identifiers, user identities, etc. In particular, it may prevent the derivation of the same keying material for different contexts (when the same input key material (IKM) is used in such different contexts). It may also accommodate additional inputs to the key expansion part, if so desired (e.g., an application may want to bind the key material to its length L, thus making L part of the 'info' field). There is one technical requirement from 'info': it should be independent of the input key material value IKM.

We specify an Info field to produce the "Share data" or "Index seed"

I think this could be easier to follow than slicing a larger pbkdf2 dk_len to produce index_seed and share_data. But both are pretty easy to implement.

Additional feature (I need it for my project):

Another aspect that the deterministic implementation should support is optionally passing pre-existing shares to leave unchanged in the new set. This means parsing the strings for their indexes and deleting them from CHARSET before the shuffle. Just like we already do with the codex32 secret and its index "s". And outputting fewer new independent deterministic shares.

This is generalizing the function from turning 1 codex32 string into a deterministic share set, to turning any codex32_strings_provided <= k into an auditable deterministically generated and shuffled share set.

An example application is if the user has memorized a "brain share" and doesn't wish to change it. Or to re-share and reuse k-2 existing shares. Or produce decoy seeds & shares from k-1 correct shares, etc.

BenWestgate commented 1 year ago

*One of the up to k codex32 strings provided MUST be the codex32 secret. Only index "s" can be fed to PBKDF2 to derive shares!

The provided shares are only used to delete indexes, decrement the quantity of new independent shares to create and then used with the secret and new independent share(s) (if any) to derive the rest of the set . This means the concept of "secure re-shares require a new identifier" still applies.

BenWestgate commented 1 year ago

When using the more efficient and secure derived_key approach,entropy input to the pbkdf2 is always header of the codex32_secret ("ms1idens") and master_seed the same as "S" from bip32:

I'm a little lost. This sounds fine to me but then how is master_seed derived? I the s share was the output of the PBKDF.

The BIP32 seed is produced with User_Entropy: dice rolls, key mashing, coin flips, etc and unconditionally displayed App_Entropy: a 64-byte pseudo random number from the device. Then they are passed through as computationally expensive KDF as the hardware supports:

In order of preference: 1. Argon2id using half the device's memory and 10 seconds as time cost. 2. Scrypt using half the devices memory, needs iterating to increase time cost, not recommended for low memory devices. 3. PBKDF2 where the iterations take approximately 10 seconds on the devices hardware.

master_seed can also be created from debiased dice rolls as per the current booklet for hand generation. This is the BIP32 master seed, so how it is generated is not pertinent to the share set generation. The user may already have a BIP32 master_seed they are using that they want to split into a codex32 SSS backup for example.

Edit: I think to help with these questions, I'll fork this repo and add a Python reference for auditable implementations. Commenting out parts still under consideration or displaying multiple implementation options.

Then we can do Draft reviews until it's Ready. This thread is unwieldy but the code explains itself much more simply than words do.

For example here: https://github.com/BlockstreamResearch/codex32/tree/master/reference /python-codex32/src/lib.py