BlockchainCommons / Research

Blockchain Commons Research papers
Other
111 stars 38 forks source link

MUR Implementation Guide mentions fragment chooser uses SHA-256 hash as seed but implementations don't #139

Closed thunderbiscuit closed 3 months ago

thunderbiscuit commented 5 months ago

The Fragment Chooser section of the Multipart UR Implementation Guide states the following:

When seqNum is greater than seqLen, the big-endian serialized seqNum and checksum are concatenated and then passed through SHA-256 to generate a 256-bit seed for the PRNG.

(seqNum || checksum) -> SHA256 -> Xoshiro256

I have not found this to be the case in the URKit, Hummingbird, ur-rs, or foundation-ur-py libraries.

Instead, a 64-bit seed composed of the joined seqNum and checksum is passed to the Xoshiro256 RNG without hashing. Note that Xoshiro internally does sha-256 hash the seed when initialized, so that might be where this came from, or maybe where I'm getting confused? In any case, the seed provided to the Xoshiro constructor is not a 256-bit seed.

wolfmcnally commented 3 months ago

Let's look at URKit:

    func chooseFragments(at seqNum: UInt32) -> FragmentIndexes {
        // The first `seqLen` parts are the "simple" fragments, not mixed with any
        // others. This means that if you only generate the first `seqLen` parts,
        // then you have all the fragments you need to decode the message.
        if seqNum <= seqLen {
            return Set([Int(seqNum) - 1])
        } else {
            let seed = Data([seqNum.serialized, checksum.serialized].joined())
            let rng = Xoshiro256(seed: seed)
            let degree = degreeChooser.chooseDegree(using: rng)
            return Set(shuffled(indexes, rng: rng, count: degree))
        }
    }

As described, we are concatenating the serialized (big-endian) seqNum with the serialized checksum. We are then passing this to a specific initializer of the Xoshiro256 type:

    convenience init(seed: Data) {
        self.init(digest: SHA256.hash(data: seed))
    }

As you can see, the first thing it does is perform the SHA256 hash on the seed, as described, and then it calls the next level initializer with that 256-bit result. That in turn calls the base initializer which sets the seed as 4 unsigned 64-bit integers.

    convenience init(digest: SHA256Digest) {
        var s = [UInt64](repeating: 0, count: 4)
        digest.withUnsafeBytes { p in
            for i in 0 ..< 4 {
                let o = i * 8
                var v: UInt64 = 0
                for n in 0 ..< 8 {
                    v <<= 8
                    v |= UInt64(p[o + n])
                }
                s[i] = v
            }
        }
        self.init(state: s)
    }

    init(state: [UInt64]) {
        assert(state.count == 4)
        self.state = state
    }

So the code does what the MUR implementation guide says, even if not all on the same line of code.