Open Stebalien opened 4 years ago
That might not be the perfect place to bring it up, but it's so related. As I've been working on the Rust implementation of Multihash, it came up that the identity hash currently doesn't specify any limits. From an optimization perspective (this is why it came up in Rust), but also from a security perspective I think it would make sense to specify an upper bound for its size.
I personally would take a quite low limit which is similar to what current hash functions have as length. So perhaps something around 64 bytes?
( we should probably take this into a separate issue ) @vmx there are definitely deployments out there today ( i.e. peergos ) using ~2k inlined CIDs. Generally any data that you know won't ever be repeated is a good candidate for inlining. An upper limit already exists: the limit of a network block itself ( 1MiB soft, 2MiB-1 hard ). 64b is most definitely arbitrary and I'd be very sad if we adopt that.
I don't want to derail this issue, hence I openend https://github.com/multiformats/multihash/issues/130 (I should've from the start, sorry).
Closed by #2568, I think?
No. That paved the way to support this feature, but we still don't actually inline small blocks into CIDs.
@Kubuxu took a block size:
Given this, inlining small blocks into CIDs using the identity hash function would save at least 12% of disk space (probably more because these CIDs would often be smaller).
It would also save us from having to write/read all these small objects. Unfortunately, we don't have an access histogram.
Here's an auto-inlining CID builder: https://github.com/ipfs/go-cidutil/blob/master/inline.go
The tricky part is how to wire this in. Ideally, we'd expose the CID builder on the runtime and use it internally inside the CBOR store. Unfortunately, we have some objects that expose a
Cid()
function to create their own CID.The best reasonable solution is to:
cbor.NewCborStore
take a CIDBuilder in the constructor.