Support encrypted shares (by doing a second layer of 2-of-2 SSS)

apoelstra commented 3 years ago

Mostly filing this as an issue so I don't forget. If we took another header character to indicate a second layer of SSS, this would enable encrypted shards. (Which are useful temporarily for transporting shares, even if you think that keeping permanently encrypted shares aronud would be too fragile).

roconnor-blockstream commented 3 years ago

According to https://github.com/WebOfTrustInfo/rwot8-barcelona/blob/master/topics-and-advance-readings/social-key-recovery.md (referenced from SLIP-39) people may want multi-level SSS in order to facilitate social recovery subject to various restrictions.

roconnor-blockstream commented 3 years ago

The current proposal has left semantics for data portions beginning with A-Z as undefined (aka invalid). So there is room to repurpose that for multi-level schemes. Multi-level schemes will likely require a significantly more detailed header which supplies the thresholds of every group.

Perhaps something like MS1ZIIII(nS)*ZnS for a 2 level scheme where n is the threshold of the i th group and S is the index of the _i_th, with as many groups as there are 'n's until you hit a Z, with the final nS being used as the group threshold and index. for the share.

Another possible scheme is MS1ZnIIIISSSS where n is the threshold for the the first S is the group index, and then if you are using a threshold of 2 or 3 or 4, you use the 2nd, 3rd or 4th 'S' for your index. The position of the non-S character indicating the threshold size. This limits the size of the thresholds of the lower level to being 4 or less.

apoelstra commented 3 years ago

Between the two of those I like the first one, because it's more flexible and because it doesn't waste space on levels that aren't used, if they're not used.

In the 2-level case it also lets you fit the data into a single cryptosteel side, provided you drop the MS1Z prefix ... which is very inadvisable in general but might be okay for my ephemeral-encryption usecase.

roconnor-blockstream commented 3 years ago

For the simple transport case, a simple one time pad with a QQQ checksum would suffice.

apoelstra commented 3 years ago

I agree, and may do this in practice if I'm sending shares to people who are familiar with the scheme and its algebra. But using 2-of-2 has the following advantages:

Does not require a "new" scheme which requires extra explanation/instruction to use
Decryption forces the recipient to use the recovery wheel etc and do a "practice run" for actual recovery, ensuring that they are willing and able to do it

roconnor-blockstream commented 3 years ago

Here is a crazy half-baked idea.

Currently our HRP of "ms" implies a prefix that is effectively "PRRQDN". But what if we made our prefix "PRRQDNSSSSSSSS".

This has no effect on the basic 1 level splitting scheme other than an arbitrary change to the checksum initial value.

But now suppose you have one of your level 1 shares, and, post-facto you decide you want to split it. Either for transport, or otherwise.

What we do is define a new prefix, one of "ms2g" .. "ms9g" depending on the threshold choice for our second level share. These imply a prefix that is "PRRQDNSSSSSSS" through "PRRQDN" respectively. And we take our old level 1 share and prefix it with "S" through "SSSSSSSS" respectively.

What we have done is slid some number of "S"s from the implicit prefix to an explicit prefix. This makes it so that the final checksum value remains unchanged.

Now we run secret sharing on our second level share using the first character as the sub-share index.

Maybe we can leave the string of S's coming after the first character as implicitly added by the scheme's prefix.

apoelstra commented 3 years ago

Hah! To fill in some details, and check my understanding...

ms turns into PRRQDN where the P comes from the initial monomial term in the checksum algorithm and the rest are from ExpandHRP (which encodes the high 3 bits of each character, then Q, then the low 5 bits) ... and that if we were to add eight Ss to this result essentially all this would do is change the initially-filled-out characters in the worksheet.
But if we added all these Ss implicitly then we are no longer interpreting the HRP in the way that bech32 does. I guess this is your intention, although it makes me nervous because it's an obscure detail that somebody would need in order to reproduce the scheme
You are suggesting that we use 2 as the HRP separator rather than 1, to make this deviation from bech32 clearer.

This is very tempting because, as you say, it lets the user decide retroactively to split his share up into more pieces, without having so many explicit Ss wasting space...on the other hand, I like that right now you can basically recreate the entire scheme knowing bech32, the new generator polynomial, and a few 1-in-32 guesses..

roconnor-blockstream commented 3 years ago

No, my intention was to begin with "ms2g1" for a 2-of-n second level split, and "ms9g1" for a 9-of-n second level split.

That said, there are some variations that are possible. The "g" can be whatever we want, and I'm tempted to move the digit into the data portion subject to a complicated checksumming rule.

One variation is to begin with msg1<n><s>... where <n> is a digit representing the threshold of the subshare and <s> in the index of the subshare, and a checksum rule that <n><s> is replaced by a string of 8 "S"s with the (10-<n>)th S replace with <s> is. Or we cold do msg<n><s>1... and construct the same string as (the tail of) the implicit prefix.

I think overall it is worthwhile to make this change, but maybe we can think of an even better idea.

roconnor-blockstream commented 3 years ago

Another line of thought is that we should be able to insert a carefully crafted 13 or longer character string at the beginning without changing the checksum. Possibly somewhat fewer than 13 characters if we mung the HRP appropriately. If we inserted 14 characters we would have a degree of freedom to perhaps choose a suitable value to indicate the threshold of the subshare.

apoelstra commented 3 years ago

I think we can't craft such a string without knowing the exact length of the full string. (Basically, inserting this string would shift the HRP and leading 1 from being multiples of x^n (mod the generator) to being a multiple of x^{n+13}. For a fixed HRP it is easy to calculate the difference in residue, which will be independent of n ... but our inserted characters will then need to equal this difference times x^-n, which we can't compute without knowing n.)

Having said this, I like your idea to have a "compressed encoding" for the shares which is uncompressed during checksum encoding.

apoelstra commented 3 years ago

One variation is to begin with msg1<n><s>... where <n> is a digit representing the threshold of the subshare and <s> in the index of the subshare, and a checksum rule that <n><s> is replaced by a string of 8 "S"s with the (10-<n>)th S replace with <s> is. Or we cold do msg<n><s>1... and construct the same string as (the tail of) the implicit prefix.

Ok, I like this. (The first one, where the extra data comes before the HRP.) I think the critical properties here are:

The user will need to "decompress" the <n><s>s into the fixed-width S-string manually when working out the checksum worksheet
(It is tempting to try to hide this by means of modifying the prefilled data or something, but since there are exponentially many possibilities I don't think this is feasible)
The user does not need to do this decompression during any of the SSS steps, since the <n>s are constant and the <s>s are share indices which form a line.

I like this. My proposal though would be to just write <s><s>...<s>Z where the decompression step is simply to pad out the list of <s> values by adding Ss until there are 8 symbols total, and then to drop the Z.

Edit: actually I am forgetting that we need to encode both the share index and the threshold amount, at each level

roconnor-blockstream commented 3 years ago

Okay I think we can move all the nastiness into a specification of the second-level sharing.

Given any first level prefix, which currently is drafted to be PRRQDN and, because it is so short, is equal to its own residue, for any sub-threshold <n> there exists some other prefix of the form ...<n>S with no more that 15 characters, whose partial residue is also equal to QQQQQQQPRRQDN. Using that prefix we can push an "S" to the front of a valid MS32 string, and keep the checksum valid. The second-level sharing specification just needs to provide a fancy mapping of ms2g ... ms9g to the suitable prefixes of the form ...2S through ...9S in one table. That way we get a checksum that, commits to the threshold, and lets us insert a 'S' into the header which we can then use as a sub-index for secret sharing.

apoelstra commented 2 years ago

To be clear -- to do second-level sharing in this scheme, you will need to add (at most 15 characters to the start during the checksum worksheet (which we can hide behind precomputed values in the case of S but would quickly become unweildy for the other share indices). To do third-level sharing you'd need to add 30 characters, to do fourth-level you'd need to add 45, etc etc?

roconnor-blockstream commented 2 years ago

For the worksheet, what I'm imagining would require adding threshold - 1 many Ss to the start and having a lookup table to tell give you a compacted start for a given threshold.

apoelstra commented 2 years ago

The issue I see with that is that if I've got share A of a secret, and I want to further split that, then the prefix I want is ...SSSA rather than ...SSSS.

apoelstra commented 2 years ago

So, the 14-character prefix 4YNKJQYY96S3WS has partial residue QQQQQQQPRRQDN, and ends in S, as desired.

I can similarly get a 15-character prefix which ends in SS, a 16-character prefix which ends in SSS, etc.

Your proposal then, as I understand it, is to generalize the hrp calculation:

ms1 maps to QQQQQQQPRRQDN as it currently does
ms2g1 maps to 4YNKJQYY96S3W
etc for ms3g1 and so on

And the instructions for the user, to do sub-share nesting, is:

change ms1 to ms2g1, or ms<n>g1 to ms<n+1>g1
prefix an S to the rest of the message
create additional shares, changing your new S as appropriate

roconnor-blockstream commented 2 years ago

I believe that is accurate.

apoelstra commented 2 years ago

Lol, actually, 686A8UK5DRGL8SSSSSSSSSSSSS (thirteen Ss) has the residue QQQQQQQPRRQDN ... so we can just use prefixes of this of decreasing size.

BenWestgate commented 1 year ago

In the 2-level case it also lets you fit the data into a single cryptosteel side, provided you drop the MS1Z prefix ... which is very inadvisable in general but might be okay for my ephemeral-encryption usecase.

Since 'ms1' currently always interpolates to share 's' and considers that as secret, then change the 's' in 'ms1' to whichever share index we are sub-sharing i.e. 'md1', 'ma1' or 'm21'.

This lets users retroactively decide to sub-split existing shares, keeps the string 48 characters and doesn't break bech32 hrp rules.

The threshold of the 'ms1' original master seed shares can be stored as:

The 1st character of the identifier of sub-shares OR
Make identifier of non ms1 sub-shares must be the BIP32 fingerprint of the master seed, allowing the ms1<threshold> to be identified by trial and error OR
The padding of the 1st non-"secret" sub-share (the "secret" is the index corresponding to the 'm_1' prefix) can encode thresholds 2, 3, 4, 5 in 128-bit sub-shares and all of them in 256-bit shares.

Example of sub splitting two existing shares 'a' at 2 of 3 and 'b' at 4 of 4:

Share with index a: ms13casha320zyxwvutsrqpnmlkjhgfedca2a8d0zehn8a0t

Share with index a as a sub-share "secret": ma123casa320zyxwvutsrqpnmlkjhgfedca... Sub-share of a with index c: ma123casc... (in option 3, the padding as a binary int + 2 equals the ms1<threshold>) Sub-share of a with index d: ma123casd... Sub-share of a with index e: ma123case...

Share with index c: ms13cashcacdefghjklmnpqrstuvwxyz023949xq35my48dr

Share with index c as a sub-share "secret": mc143cascacdefghjklmnpqrstuvwxyz023... Sub-share of c with index a: mc143casa... Sub-share of c with index d: mc143casd... Sub-share of c with index e: mc143case... Sub-share of c with index f: mc143casf...

In retrospect, may as well dedicate an entire character in the first non-"secret" sub-share of each index group to storing the threshold of the main (ms1) level of secret sharing and lose 3 bits of security in sub-shares. This preserves the full 4 character original identifier.

In this setup the first sub-shares of each index group would look like:

ma12cashc...3...

mc14casha...3...

Where 3 is the final character of the payload.

The remainding sub-shares would have the identifier cash and fully random payloads.

apoelstra commented 1 year ago

If the identifier of the subshares doesn't match the identifier of the share, then recovery won't work correctly (you will recover the share but then need to patch its identifier afterward).

BenWestgate commented 1 year ago

If the ms1 threshold is stored the sub-share padding, only the threshold and hrp change when converting from a "sub-share secret" to a codex32 share. This lets the identifier match between the codex32 secret, codex32 shares and any sub-shares.

apoelstra commented 1 year ago

Right, but if the threshold and hrp have to change, then the entire checksum needs to be recalculated.

BenWestgate commented 1 year ago

If your sub-shares combine successfully the checksum is no longer needed except for recovery by hand, where you may want to calculate two or three share checksums before the final master seed recovery.

The book calls hand recovery unnecessary since electronics are needed to transact with the seed.

It's a space vs time trade off to stay within 48 char and let existing backups sub-split.

apoelstra commented 1 year ago

If your sub-shares combine successfully the checksum is no longer needed except for recovery by hand

My primary use-case for sub-sharing is to do encryption by hand by 2-of-2 splitting my shares.

It's a space vs time trade off to stay within 48 char and let existing backups sub-split.

Right. But it's more than just time we're trading off ... a 48-char scheme also has less room for identifier, is harder to read unambiguously, and is limited in how many layers you can do.

BenWestgate commented 1 year ago

I had multi-level SSS in mind. I agree then, the checksum should preserve by hand. Let's abandon 48-char then, keep 'ms' or 'ms1' as beginning of header for disambiguation, and put the share threshold(s) sub-indices and index somewhere that don't interfere with our import document. (I really disliked the drop down hrp menu idea)

Import doc expects 'ms1' then string[3] = 0, or 2-9, 4 free characters, then string[8] = 's' if string[3] = 0, otherwise free if unique from last entry.

That may mean the share limit remains 32 even with sub-sharing. Or that when the import detects the longer string or different checksum, it waives the uniqueness requirement as long as it's not a repeated index within a sub-split level.

Dropping support for every divisible by 8-bit length would let the length flag for this format and improve error correction everywhere.

Adding the full 13-14 characters to preserve the ms1 checksum seems better than compression that breaks from bech32 hrp rules (the format can "insert" into an existing share).

The other reason to add so many characters to the string is desktop wallets that try deletes can do 6-8 of them in a few seconds and the longer string will avoid inadvertently falsely correcting to an ordinary share when the sub-share has errors.

apoelstra commented 1 year ago

If we want to import multi-level shares directly into wallets then the import document will never be finished :).

Agreed, we should make a goal here to avoid interfering with it, but I think it'd be fine if we wound up with a scheme that was recognizably a multi-level share and which we later had to amend the document (to, e.g., relax the 32-share limit).

BenWestgate commented 1 year ago

I'm not suggesting to add it to the import document now. I think the import document is ready to review if you like this version: https://github.com/BenWestgate/codex32/blob/import-doc/docs/wallets.md

It's clear to me, not missing anything important, avoids excessive detail about optional features, has more bullet points for easier scanning.

As long as the multi-level scheme starts with 'ms' and will never correct to an ordinary share from insertion/deletion (or vice versa), I'm happy.

I think the only way to guarantee that is to drop support for every divisible by 8 bit-length and fall outside of the max edit distance from 48 & 74.

A fairly slick idea may be to encode the entire data portion (checksum and all) of a share as the payload for the 's' (or matched index, whichever seems better) sub-share, then sub-shares would already be compatible with the current import doc. Only amendment would be to mention that when the recovered data is length 45 or 87 (or other specific lengths for 3+ levels) and has a valid codex32 checksum to import the recovered data as a (sub-)share rather than a seed. The chances of the payload having a valid codex32 checksum inside itself are effectively zero and the unusual length would confirm.

apoelstra commented 1 year ago

I think the only way to guarantee that is to drop support for every divisible by 8 bit-length and fall outside of the max edit distance from 48 & 74.

Ok, lemme bring this up on IRC. We had considered doing this anyway because the other lengths are dumb and make everything harder to specify.

BlockstreamResearch / codex32

Support encrypted shares (by doing a second layer of 2-of-2 SSS) #16