fairDataSociety / bmt-js

Binary Merkle Tree operations on data
4 stars 6 forks source link

How can I get the root hash for the bzz protocol? #13

Open mattiaz9 opened 2 years ago

mattiaz9 commented 2 years ago

I can retrieve the root hash for the /bytes endpoint like this:

const data = new Uint8Array(fs.readFileSync("/some/file.txt"))
const chunkedFile = BmtJS.makeChunkedFile(data)
const rootChunk = chunkedFile.rootChunk()
const hash = Buffer.from(rootChunk.address()).toString("hex")

but if I wanted to upload that file to /bzz how can I get the correct hash?

agazso commented 2 years ago

For that you would need also to construct a manifest (with the mantaray-js library) and add the filesystem/folder metadata in it. Then serialize the manifest, get the BMT hash of the root chunk of the manifest as the reference that you can use for the /bzz endpoint.

I hacked together an example that can upload a single file or folder and returns a reference that can be used as the root hash for the bzz protocol here: https://github.com/agazso/swarm-random-chunk-upload/blob/master/upload.ts

mattiaz9 commented 2 years ago

@agazso I don't seem to understand that library well. I want to retrieve the hash before the upload, but when I do:

const node = new MantarayNode()
node.addFork(new TextEncoder().encode("/"), hash, {
  "Content-Type": "video/mp4",
  Filename: Buffer.from(hash).toString("hex"),
})
const reference = Buffer.from(node.serialize()).toString("hex")

I get this error: cannot serialize MantarayFork because it does not have contentAddress

agazso commented 2 years ago

I haven't used node.serialize() before, so I am not sure how that works. I used the node.save(storageSaver), which expects an instance of a StorageSaver interface as an argument. For that I used a function that takes a byte array as input and then splits it into chunks and then you can serialize them.

const storageSaver = async (data: Uint8Array) => splitAndEnqueueChunks(data, queue, context)

https://github.com/agazso/swarm-random-chunk-upload/blob/319f91e1bbc725109f4d6b194cb275b17b968bec/upload.ts#L116

The reason to do this is that it may happen with multiple files or with folders that your manifest chunk grows bigger than a single chunk and then the manifest data itself is required to split into multiple chunks with the same logic as you would use for a file.

If you just need the address without uploading to bee you can replace the splitAndEnqueueChunks function with makeChunkedFile(data).address() as a StorageSaver.

I don't know if this is the simplest way to solve your original question but I know that this works because I tried it and managed to get the same hash as with Bee. However I found another gotcha in the mantaray library: there is a missing padding after the metadata and that can cause differences and therefore different content hash. But after fixing that bug it produced the same result as Bee.

mattiaz9 commented 2 years ago

However I found another gotcha in the mantaray library: there is a missing padding after the metadata and that can cause differences and therefore different content hash.

Thank you @agazso, I'll give it a try when they'll fix it.