Blockheader construciton

Harryman commented 5 years ago

There are a couple options I think will work.

all of these assuming pub-key is cached and use of 32byte schnorr signatures
Current:
(op(4b) + mekle root + prevSig)signature // 4+32+32+32 = 100b spv with 4x10^9 message types

p1, flat with larger op field:
(op(8b) + merkle root +prevSig)signature // 8+32+32+32= 104b SPV with 9x10^18 message types

p2, merklized header:
(merkle root) signature  // 32+32
|   \--\===========\-------------- \
prevSig           meta hash       data hash // 32+32+32 spv = 32*5
                    /    \
                  type   value

p3, hybrid: 
((meta hash + merkle root + prevSig)signature  // 32+32+32+32   = 128b SPV with 2^256 message types
   /    \ \------\
type    value   nonce

version	NonceRelay	HeaderRelay	bCastRelay(128b tuple)	bCastRelay(36b tuple)	hiddenBCastRelay(36b tuple)
current	64b	100b(2^32mtags)	356b	264b	264b(with entropy in data root)
p1	64b	104b(2^64 mtags)	360b	268b	268b(with entropy in data root)
p2	64b	160b(2^256mtags)	352b	260b	who cares its inefficent
p3	64b	128b(2^256mtags)	320b	228b	260b(32b nonce part of meta hash)

There are some trade offs between these proposals, headerRelay messages aren't all equal. the p3 version would work best with data like t: tweet, value: #hashtag because the meta hash would represent that specific configuration.

Current, p1 and related

Where as (current,p1) allows you to break this strict mapping, however this introduces complexity into protocol as to what op is considered to be mapped properly to a type and or type,value tuple. Requiring clients to make a judgement of op to type validity, what this gains is some bandwidth efficiency, it also allow for reserving based on type and value as well as network reserved op values. What is unclear is how efficient the canonicalization of op types will be across the network or how fast collisions might happen since they are not generated psuedorandomly, with 8b op ids this would be less of a problem.

One solution for efficient canonicalization could just be to use a truncated meta hash. If you take the first 8b of a meta hash it is secure enough to prevent most grinding attacks and you save 24b in the header, however you lose the ability to follow a type or a tuple, its only the tuple. To restore this individual tracking only take 4b for each type and field and they both become trivial to grind, giving 8b to each type and value only confers 16b of savings. However a relaxation of this could also work since to give best of both worlds by giving multiple paths to generating the opcode. One could use the first 16b of the meta hash, 8b of type and value, 16b of either type or value, or random for hidden broadcasts. This would allow for easier canonicalization since there are only a few possible ways an opcode could be generated and 16b gives a pretty large search space to reduce collision chance, though 8b splits may be vulnerable, nodes could not index those messages that don't match their reported types, as long as one honest node indexes it others could see that a node was censoring their results. Hidden broadcasts could save 16b by not having a op field.

p3

I'm leaning toward the p3 hybrid version. With a massive 2^256 message type field there is plenty of room to add hard coded message types to the client without worrying someone will try to grind a hash to one of these values. The client then only has to know that if a meta hash equals a certain value that it should be handled by the protocol directly. The other reason is that it actually lowers the cost of broadcast relay which will allow people to keep more info on more nodes.

One negative is lack of being able to follow a specific type since different values will hash to completely different meta hashes. Clients would have to be listening to broadcasts to figure out the preimages that map to certain type. One possible variant would be to just include the first 16b of the type and value concatenated together. This should be safe from grinding attacks, even if successful there isn't much you can do with that other than lie about what the block contains which isn't really helpful.

Another negative to this approach is that your blocks will leak metadata. There are couple different approaches you could use to hide this meta data while still being useful to follower.

You can always put a 0 for the meta hash and then store the meta data in the merkle root which would hide your meta data given there is enough entropy in the rest of the tree. Only node that have enough of the data in the merkle tree would be able to discover the meta data.
add a nonce to the meta hash so unless someone gets the nonce they cannot guess the type of the message. This would allow you to selectively distribute nonces to hide your meta data while still enabling you to gossip your block headers widely.

Either solution allows you to gossip block headers without revealing what they are about. However if someone shares the nonce anyone can verify your meta hash. The same issue also arises in the first solution as anyone with enough data could over share it. This oversharing problem will be there in any implementation. Unfortunately the side effect of non-repudiation is that anyone can reveal a nonce and what was private is now public in terms of metadata. Never reusing your nonce would add more bandwidth and storage requirements but would allow you to reduce the amount of leakage as it would provide some in that the attacker would have to be given a larger number of nonce to prove your metadata.

p3 vs p1

Looking for comments on what is a more sound construction

p3

pros:

more secure header
easier to follow specific meta type,value pairs, canonicalization
explicit non public broadcast, if it has a nonce they probably don't want you to index it to the whole world
slightly cheaper public broadcasts, almost the same hidden broadcast size

cons:

larger block header by 16b

p1

pros:

smaller blockheader
no cost for hidden broadcasts

cons:

public broadcasts are larger
grinding attacks are more possible

Harryman commented 5 years ago

I think the debate comes down to what are going to be most prevalent.

Headers will be very widely gossiped by lots of nodes, if p1 is doing a hidden broadcast it could have a shorter block header down to 92b by removing the op field. I think no broadcast/hidden broadcast messages will be by far the most prevalent so savings here is most advantageous.

Hidden Broadcasts might be the most popular since you don't necessarily want to broadcast all the topics you post about widely to everyone. p1 does this cheaper than p3, if the opfield isn't included in the header. p3 without rotating nonces allows people to quickly scan your chain for messages they care about, making liveness less of an issue. Though this could be alleviated by giving a list of service endpoints that could tell you what you missed while you were offline.

Public broadcasts I'd imagine these would be probably the least common type of message, but if they are meant to be picked up by lots of nodes, extra bits are very expensive to the protocol. If you used p1 with 32b opcode, bringing to the same functional level as p3 it would 396b vs 320b(128b tuple) or 304b vs 228b(36b tuple)

Harryman commented 5 years ago

There is always the option to remove the opcode entirely. This would make it easier for people to hide key replacement broadcasts because headers only clients wouldn't be able to detect them unless explicitly told by the nodes. Another option is to compromise and leave one or two bytes for opcodes which allows for protocol level message types to be enforced by listening to headers only but don't leave enough entropy in there to be used for anything useful. I'm going to sleep on this but I think this is the best course of action.

Harryman / hashd

Blockheader construciton #17

Current, p1 and related

p3

p3 vs p1

p3

p1