ethereum / portal-network-specs

Official repository for specifications for the Portal Network
296 stars 81 forks source link

Avoiding Union in the BlockHeader Proof serialization or not? #337

Open kdeme opened 5 days ago

kdeme commented 5 days ago

Currently the BlockHeaderWithProof is a container with the RLP encoded header and a Union with the different type of proofs (well currently there is only None and the BlockProofHistoricalHashesAccumulator proof).

We might want to remove the None (see issue https://github.com/ethereum/portal-network-specs/issues/336), which means we have a "hole" in the Union. Because SSZ Union does not allow holes, this basically means that it cannot be removed, and after the succesful decoding of this type, the code still needs to error in case of the None. This would be an option on how to deal with it.

Another option, actually removing it from the Union definition, will reset the selector values, meaning that currently stored data would become invalid.

If deciding to do so, nodes could include the code that moves these old versions to the new type. Or the easier way would be just to delete databases and inject all data again.

However, if deciding to do so, then perhaps it would be better to drop the Union all together? To avoid similar issues in the future? One way would be to have a type agnostic ByteList for the Proof. The included RLP encoded header already has the information to know which type of proof we expect (based on timestamp). Or we could still add a prefix manually. Based on that information the proof can then be decoded and verified.

pipermerriam commented 5 days ago

I'm not convinced that Union types are the problem here. I believe that even in the case of using a different encoding such as an opaque ByteList this problem would still be present.

I think the root of the problem is clients having a 1:1 mapping between their database serialization and the protocol serialization. I think a parallel can be drawn between web applications and how there is a database representation of objects and an application representation of objects. Most web applications that I've worked with have a migration engine because these two representations evolve over time and you end up needing a way to bring the things that are in the database up-to-date to the new way things are represented in the application.

The same applies to our protocol. Any changes to the protocol must be propagated down to how the client represents them in their database. A client design that treats these as the same will have some pain points when the protocol representation changes. A client that has a formalized concept for the old schema and is able to formalize migrating things between these different representations will be well equipped to deal with these protocol changes.