aleph-im / pyaleph

Next generation network of decentralized big data applications. Current connected chains: Ethereum, Solana, Polkadot/Substrate, Cosmos-SDK, NULS.
https://aleph.im
MIT License
75 stars 19 forks source link

[ETH] Reduce payload size #222

Open odesenfans opened 2 years ago

odesenfans commented 2 years ago

The current format to store data on-chain can be improved to reduce the size of the payload and reduce transaction costs. Currently we store the data as a JSON dictionary:

{"protocol": "aleph-offchain", "version": 1, "content": ipfs_id}

or

{"protocol": "aleph", "version": 1, "content": {"messages": messages}}

when we only store short messages (unlikely to happen at this stage).

Since the JSON payload is already quite straightforward, the only change in format that would make sense is to switch to a binary format. Ex: byte 1 = protocol, byte 2 = version, variable length bytes = IPFS hash.

Another option to reduce fees is to store more messages in the same payload. We limit it to 10k messages at the moment but most TXs effectively target less messages, resulting in more TXs than strictly necessary. This would however increase the latency to confirm messages.

moshemalawach commented 2 years ago

A binary format makes quite a lot of sense...

Especially since the IPFS hash (v1) is a multihash in base58, making it a 46 bytes string. If we only keep the sha2-256 part of it, raw, it's 32 bytes.

We would then need a byte 3 = method

Method being, as an example: 0x01 => storage engine (our own) with sha256 hash 0x02 => ipfs with v1 hash etc...

Less readable but we get from 105 bytes currently to 35.

odesenfans commented 2 years ago

Okay for the method field, makes sense. I'll make a protobuf spec for this and add it to the CCNs.

odesenfans commented 2 years ago

Note that we do not strictly need a method field since it's possible to infer it from the hash. One less byte $$$

odesenfans commented 2 years ago

The ABI of the smart contract specifies that the message field must be a string, so we will serialize to a string instead.

moshemalawach commented 2 years ago

Nothing says a string can't contain non-ascii chars :D

Mmh, if we store only the sha bytes from the multihash we can't infer it, as our own engine uses sha256 too

odesenfans commented 2 years ago

We don't store the method in the current format either, we just assume IPFS. Is there really a use case to support other storage methods?

Regarding the changes, I will split the work in two parts. We will first define and add the serialized string format for on-chain data for the next release (should be v0.2.2) and then add the on-chain writing part to the release after that one. In the current version, nodes just drop pending TXs that fail to decode to JSON (no retry). If we integrate the read + write changes in the same release, nodes that remain on the previous version will miss pending TXs.

Another option is to force nodes to reprocess all TXs after the upgrade.