Open apoelstra opened 3 years ago
According to https://github.com/WebOfTrustInfo/rwot8-barcelona/blob/master/topics-and-advance-readings/social-key-recovery.md (referenced from SLIP-39) people may want multi-level SSS in order to facilitate social recovery subject to various restrictions.
The current proposal has left semantics for data portions beginning with A-Z as undefined (aka invalid). So there is room to repurpose that for multi-level schemes. Multi-level schemes will likely require a significantly more detailed header which supplies the thresholds of every group.
Perhaps something like MS1ZIIII(nS)*ZnS
for a 2 level scheme where n is the threshold of the i th group and S
is the index of the _i_th, with as many groups as there are 'n's until you hit a Z, with the final nS
being used as the group threshold and index. for the share.
Another possible scheme is MS1ZnIIIISSSS
where n is the threshold for the the first S is the group index, and then if you are using a threshold of 2 or 3 or 4, you use the 2nd, 3rd or 4th 'S' for your index. The position of the non-S character indicating the threshold size. This limits the size of the thresholds of the lower level to being 4 or less.
Between the two of those I like the first one, because it's more flexible and because it doesn't waste space on levels that aren't used, if they're not used.
In the 2-level case it also lets you fit the data into a single cryptosteel side, provided you drop the MS1Z
prefix ... which is very inadvisable in general but might be okay for my ephemeral-encryption usecase.
For the simple transport case, a simple one time pad with a QQQ checksum would suffice.
I agree, and may do this in practice if I'm sending shares to people who are familiar with the scheme and its algebra. But using 2-of-2 has the following advantages:
Here is a crazy half-baked idea.
Currently our HRP of "ms" implies a prefix that is effectively "PRRQDN". But what if we made our prefix "PRRQDNSSSSSSSS".
This has no effect on the basic 1 level splitting scheme other than an arbitrary change to the checksum initial value.
But now suppose you have one of your level 1 shares, and, post-facto you decide you want to split it. Either for transport, or otherwise.
What we do is define a new prefix, one of "ms2g" .. "ms9g" depending on the threshold choice for our second level share. These imply a prefix that is "PRRQDNSSSSSSS" through "PRRQDN" respectively. And we take our old level 1 share and prefix it with "S" through "SSSSSSSS" respectively.
What we have done is slid some number of "S"s from the implicit prefix to an explicit prefix. This makes it so that the final checksum value remains unchanged.
Now we run secret sharing on our second level share using the first character as the sub-share index.
Maybe we can leave the string of S's coming after the first character as implicitly added by the scheme's prefix.
Hah! To fill in some details, and check my understanding...
ms
turns into PRRQDN
where the P
comes from the initial monomial term in the checksum algorithm and the rest are from ExpandHRP
(which encodes the high 3 bits of each character, then Q
, then the low 5 bits) ... and that if we were to add eight S
s to this result essentially all this would do is change the initially-filled-out characters in the worksheet.S
s implicitly then we are no longer interpreting the HRP in the way that bech32 does. I guess this is your intention, although it makes me nervous because it's an obscure detail that somebody would need in order to reproduce the scheme2
as the HRP separator rather than 1
, to make this deviation from bech32 clearer.This is very tempting because, as you say, it lets the user decide retroactively to split his share up into more pieces, without having so many explicit S
s wasting space...on the other hand, I like that right now you can basically recreate the entire scheme knowing bech32, the new generator polynomial, and a few 1-in-32 guesses..
No, my intention was to begin with "ms2g1" for a 2-of-n second level split, and "ms9g1" for a 9-of-n second level split.
That said, there are some variations that are possible. The "g" can be whatever we want, and I'm tempted to move the digit into the data portion subject to a complicated checksumming rule.
One variation is to begin with msg1<n><s>...
where <n>
is a digit representing the threshold of the subshare and <s>
in the index of the subshare, and a checksum rule that <n><s>
is replaced by a string of 8 "S"s with the (10-<n>
)th S replace with <s>
is. Or we cold do msg<n><s>1...
and construct the same string as (the tail of) the implicit prefix.
I think overall it is worthwhile to make this change, but maybe we can think of an even better idea.
Another line of thought is that we should be able to insert a carefully crafted 13 or longer character string at the beginning without changing the checksum. Possibly somewhat fewer than 13 characters if we mung the HRP appropriately. If we inserted 14 characters we would have a degree of freedom to perhaps choose a suitable value to indicate the threshold of the subshare.
I think we can't craft such a string without knowing the exact length of the full string. (Basically, inserting this string would shift the HRP and leading 1 from being multiples of x^n (mod the generator) to being a multiple of x^{n+13}. For a fixed HRP it is easy to calculate the difference in residue, which will be independent of n ... but our inserted characters will then need to equal this difference times x^-n, which we can't compute without knowing n.)
Having said this, I like your idea to have a "compressed encoding" for the shares which is uncompressed during checksum encoding.
One variation is to begin with
msg1<n><s>...
where<n>
is a digit representing the threshold of the subshare and<s>
in the index of the subshare, and a checksum rule that<n><s>
is replaced by a string of 8 "S"s with the (10-<n>
)th S replace with<s>
is. Or we cold domsg<n><s>1...
and construct the same string as (the tail of) the implicit prefix.
Ok, I like this. (The first one, where the extra data comes before the HRP.) I think the critical properties here are:
<n><s>
s into the fixed-width S-string manually when working out the checksum worksheet<n>
s are constant and the <s>
s are share indices which form a line.I like this. My proposal though would be to just write <s><s>...<s>Z
where the decompression step is simply to pad out the list of <s>
values by adding S
s until there are 8 symbols total, and then to drop the Z
.
Edit: actually I am forgetting that we need to encode both the share index and the threshold amount, at each level
Okay I think we can move all the nastiness into a specification of the second-level sharing.
Given any first level prefix, which currently is drafted to be PRRQDN
and, because it is so short, is equal to its own residue, for any sub-threshold <n>
there exists some other prefix of the form ...<n>S
with no more that 15 characters, whose partial residue is also equal to QQQQQQQPRRQDN
. Using that prefix we can push an "S" to the front of a valid MS32 string, and keep the checksum valid. The second-level sharing specification just needs to provide a fancy mapping of ms2g
... ms9g
to the suitable prefixes of the form ...2S
through ...9S
in one table. That way we get a checksum that, commits to the threshold, and lets us insert a 'S' into the header which we can then use as a sub-index for secret sharing.
To be clear -- to do second-level sharing in this scheme, you will need to add (at most 15 characters to the start during the checksum worksheet (which we can hide behind precomputed values in the case of S
but would quickly become unweildy for the other share indices). To do third-level sharing you'd need to add 30 characters, to do fourth-level you'd need to add 45, etc etc?
For the worksheet, what I'm imagining would require adding threshold - 1 many Ss to the start and having a lookup table to tell give you a compacted start for a given threshold.
The issue I see with that is that if I've got share A
of a secret, and I want to further split that, then the prefix I want is ...SSSA
rather than ...SSSS
.
So, the 14-character prefix 4YNKJQYY96S3WS
has partial residue QQQQQQQPRRQDN
, and ends in S
, as desired.
I can similarly get a 15-character prefix which ends in SS
, a 16-character prefix which ends in SSS
, etc.
Your proposal then, as I understand it, is to generalize the hrp calculation:
ms1
maps to QQQQQQQPRRQDN
as it currently doesms2g1
maps to 4YNKJQYY96S3W
ms3g1
and so onAnd the instructions for the user, to do sub-share nesting, is:
ms1
to ms2g1
, or ms<n>g1
to ms<n+1>g1
S
to the rest of the messageS
as appropriateI believe that is accurate.
Lol, actually, 686A8UK5DRGL8SSSSSSSSSSSSS
(thirteen S
s) has the residue QQQQQQQPRRQDN
... so we can just use prefixes of this of decreasing size.
In the 2-level case it also lets you fit the data into a single cryptosteel side, provided you drop the
MS1Z
prefix ... which is very inadvisable in general but might be okay for my ephemeral-encryption usecase.
Since 'ms1' currently always interpolates to share 's' and considers that as secret, then change the 's' in 'ms1' to whichever share index we are sub-sharing i.e. 'md1', 'ma1' or 'm21'.
This lets users retroactively decide to sub-split existing shares, keeps the string 48 characters and doesn't break bech32 hrp rules.
The threshold of the 'ms1' original master seed shares can be stored as:
ms1<threshold>
to be identified by trial and error ORExample of sub splitting two existing shares 'a' at 2 of 3 and 'b' at 4 of 4:
Share with index a: ms13casha320zyxwvutsrqpnmlkjhgfedca2a8d0zehn8a0t
Share with index a as a sub-share "secret": ma123casa320zyxwvutsrqpnmlkjhgfedca...
Sub-share of a with index c: ma123casc... (in option 3, the padding as a binary int + 2 equals the ms1<threshold>
)
Sub-share of a with index d: ma123casd...
Sub-share of a with index e: ma123case...
Share with index c: ms13cashcacdefghjklmnpqrstuvwxyz023949xq35my48dr
Share with index c as a sub-share "secret": mc143cascacdefghjklmnpqrstuvwxyz023... Sub-share of c with index a: mc143casa... Sub-share of c with index d: mc143casd... Sub-share of c with index e: mc143case... Sub-share of c with index f: mc143casf...
In retrospect, may as well dedicate an entire character in the first non-"secret" sub-share of each index group to storing the threshold of the main (ms1) level of secret sharing and lose 3 bits of security in sub-shares. This preserves the full 4 character original identifier.
In this setup the first sub-shares of each index group would look like:
ma12cashc...3...
mc14casha...3...
Where 3 is the final character of the payload.
The remainding sub-shares would have the identifier cash and fully random payloads.
If the identifier of the subshares doesn't match the identifier of the share, then recovery won't work correctly (you will recover the share but then need to patch its identifier afterward).
If the ms1
threshold
is stored the sub-share padding, only the threshold
and hrp
change when converting from a "sub-share secret" to a codex32 share. This lets the identifier match between the codex32 secret, codex32 shares and any sub-shares.
Right, but if the threshold
and hrp
have to change, then the entire checksum needs to be recalculated.
If your sub-shares combine successfully the checksum is no longer needed except for recovery by hand, where you may want to calculate two or three share checksums before the final master seed recovery.
The book calls hand recovery unnecessary since electronics are needed to transact with the seed.
It's a space vs time trade off to stay within 48 char and let existing backups sub-split.
If your sub-shares combine successfully the checksum is no longer needed except for recovery by hand
My primary use-case for sub-sharing is to do encryption by hand by 2-of-2 splitting my shares.
It's a space vs time trade off to stay within 48 char and let existing backups sub-split.
Right. But it's more than just time we're trading off ... a 48-char scheme also has less room for identifier, is harder to read unambiguously, and is limited in how many layers you can do.
I had multi-level SSS in mind. I agree then, the checksum should preserve by hand. Let's abandon 48-char then, keep 'ms' or 'ms1' as beginning of header for disambiguation, and put the share threshold(s) sub-indices and index somewhere that don't interfere with our import document. (I really disliked the drop down hrp menu idea)
Import doc expects 'ms1' then string[3] = 0, or 2-9, 4 free characters, then string[8] = 's' if string[3] = 0, otherwise free if unique from last entry.
That may mean the share limit remains 32 even with sub-sharing. Or that when the import detects the longer string or different checksum, it waives the uniqueness requirement as long as it's not a repeated index within a sub-split level.
Dropping support for every divisible by 8-bit length would let the length flag for this format and improve error correction everywhere.
Adding the full 13-14 characters to preserve the ms1 checksum seems better than compression that breaks from bech32 hrp rules (the format can "insert" into an existing share).
The other reason to add so many characters to the string is desktop wallets that try deletes can do 6-8 of them in a few seconds and the longer string will avoid inadvertently falsely correcting to an ordinary share when the sub-share has errors.
If we want to import multi-level shares directly into wallets then the import document will never be finished :).
Agreed, we should make a goal here to avoid interfering with it, but I think it'd be fine if we wound up with a scheme that was recognizably a multi-level share and which we later had to amend the document (to, e.g., relax the 32-share limit).
I'm not suggesting to add it to the import document now. I think the import document is ready to review if you like this version: https://github.com/BenWestgate/codex32/blob/import-doc/docs/wallets.md
It's clear to me, not missing anything important, avoids excessive detail about optional features, has more bullet points for easier scanning.
As long as the multi-level scheme starts with 'ms' and will never correct to an ordinary share from insertion/deletion (or vice versa), I'm happy.
I think the only way to guarantee that is to drop support for every divisible by 8 bit-length and fall outside of the max edit distance from 48 & 74.
A fairly slick idea may be to encode the entire data portion (checksum and all) of a share as the payload for the 's' (or matched index, whichever seems better) sub-share, then sub-shares would already be compatible with the current import doc. Only amendment would be to mention that when the recovered data is length 45 or 87 (or other specific lengths for 3+ levels) and has a valid codex32 checksum to import the recovered data as a (sub-)share rather than a seed. The chances of the payload having a valid codex32 checksum inside itself are effectively zero and the unusual length would confirm.
I think the only way to guarantee that is to drop support for every divisible by 8 bit-length and fall outside of the max edit distance from 48 & 74.
Ok, lemme bring this up on IRC. We had considered doing this anyway because the other lengths are dumb and make everything harder to specify.
Mostly filing this as an issue so I don't forget. If we took another header character to indicate a second layer of SSS, this would enable encrypted shards. (Which are useful temporarily for transporting shares, even if you think that keeping permanently encrypted shares aronud would be too fragile).