BlockchainCommons / Research

Blockchain Commons Research papers
Other
111 stars 38 forks source link

Fisher-Yates Shuffle test in multipart UR paper could better showcase the method #138

Closed thunderbiscuit closed 3 months ago

thunderbiscuit commented 5 months ago

The current test for the Fisher-Yates Shuffle returns a different list of shuffled indexes for each iteration of the map method because the same random number generator variable is reused for all calls on the shuffled() method. This does not invalidate the test, but makes it less obvious what the count argument does in the shuffled() method. When implementing myself, I was initially under the impression that the count somehow influenced the returned result, which was puzzling. Moreover, I was contrasting this implementation with the Hummingbird implementation of this method, which always returns the fully shuffled list. I was expecting the count to simply help me return a subset of the list, but since all the lists were different on each iterations in the Swift test, I got confused for a little bit until I realized that the issue was simply that the rng was being reused for each call.

Current test

func testShuffle() {
    let rng = Xoshiro256(string: "Wolf")
    let indexes = 1...10
    let values = Array(indexes)
    let result = indexes.map { count in
        shuffled(values, rng: rng, count: count)
    }
    let expectedResult = [
        [6],
        [5, 8],
        [4, 10, 5],
        [7, 10, 3, 8],
        [10, 8, 6, 5, 1],
        [1, 3, 9, 8, 4, 6],
        [4, 6, 8, 9, 3, 2, 1],
        [3, 9, 7, 4, 5, 1, 10, 8],
        [3, 10, 2, 6, 8, 5, 7, 9, 1],
        [1, 5, 3, 8, 2, 6, 7, 9, 4, 10]
    ]
    XCTAssertEqual(result, expectedResult)
}

Proposed new test

The text accompanying this Fisher-Yates Shuffle section mentions:

Its implementation affords efficiently generating only the number of fragments needed for a given part, rather than shuffling all fragment indexes and then taking a subset.

I propose the following small change which (a) makes explicit the value of the count argument in the method, and (b) showcases what the paragraph above describing the choice of algorithm mentions and brings it more in line with its test:

func testShuffle() {
    let indexes = 1...10
    let values = Array(indexes)
    let result: [[Int]] = indexes.map { count in
        let rng = Xoshiro256(string: "Wolf")
        return shuffled(values, rng: rng, count: count)
    }
    let expectedResult = [
        [6],
        [6, 4],
        [6, 4, 9],
        [6, 4, 9, 3],
        [6, 4, 9, 3, 10],
        [6, 4, 9, 3, 10, 5],
        [6, 4, 9, 3, 10, 5, 7],
        [6, 4, 9, 3, 10, 5, 7, 8],
        [6, 4, 9, 3, 10, 5, 7, 8, 1],
        [6, 4, 9, 3, 10, 5, 7, 8, 1, 2]
    ]
    XCTAssertEqual(result, expectedResult)
}

I'm happy to open the PR on the paper as well as the URKit tests if this is a welcome change.

wolfmcnally commented 3 months ago

Yes, I think that would be an improvement. I'd welcome your PRs.