code4it-dev / blog-comments

https://www.code4it.dev/
1 stars 0 forks source link

blog/how-to-get-random-items/ #66

Open utterances-bot opened 4 months ago

utterances-bot commented 4 months ago

Is Random.GetItems the best way to get random items in C# 12? | Code4IT

Code4IT - a blog for .NET enthusiasts, Azure lovers, and Backend developers

https://www.code4it.dev/blog/how-to-get-random-items/

jtmueller commented 4 months ago

It's very strange to do a benchmark comparison while ignoring the fastest overload of Random.GetItems, and the only approach that lets you get a random set of items without any allocations.

I would say that the conclusion should be: Always prefer the void-returning overload of Random.GetItems, unless duplicates are a problem for your use-case.

bellons91 commented 4 months ago

Uhm. I'm afraid I don't know how to use that overload... Can you please give me the body of the benchmark method I should add?

I suppose you're talking about the void GetItems<T>(ReadOnlySpan<T> choices, Span<T> destination) overload, but I don't know how to convert an array to ReadOnlySpan and how to create a Span<T> with a specific size, and then convert it to an array.

jtmueller commented 4 months ago

No problem - any array can be implicitly converted to a ReadOnlySpan simply by passing it to a method that expects ReadOnlySpan. Similarly, you can use an array for a parameter of a Span, but in some cases you can reduce allocations by stack-allocating the array.

For example (untested code off the top of my head):

    // Allocating a million-item array on every iteration means you're mostly
    // benchmarking array creation, and not actually measuring the shuffling
    // algorithm. Creating an appropriately-sized array once and filling it with shuffled
    // items repeatedly is a better test.
    private CustomRecord[] RandomItems;

    [IterationSetup]
    public void Setup()
    {
        var ids = Enumerable.Range(0, Size).ToArray();
        Items = ids.Select(i => new CustomRecord(i, $"Name {i}")).ToArray();
        Copy = [.. Items];

        TotalItemsToBeRetrieved = Random.Shared.Next(Size);
        RandomItems = new CustomRecord[TotalItemsToBeRetrieved];
    }

    [Benchmark]
    public void WithRandomGetItemsSpan()
    {
        Random.Shared.GetItems(Items, RandomItems);

        // note: when the output array is small enough to fit in the stack, you can do
        // this to avoid allocating the output array on the heap...
        // Span<T> randomItems = stackalloc T[itemCount];
        // This would work for a couple-hundred items but not a million, so I resorted
        // to the class-level output array.
    }

It might also be worth making a record struct version of CustomRecord to see how the performance differs between classes and structs.

jtmueller commented 4 months ago

Actually, I just noticed your Setup method uses [IterationSetup]. I'd prefer to use [GlobalSetup] here. I can't think of a reason to recreate the set of source items every iteration, when you [GlobalSetup] would create them once for each Size, and you'd spend less time waiting for your benchmarks to run.

I'd also be inclined to remove TotalItemsToBeRetrieved and just use Size in its place. If it randomly chooses TotalItemsToBeCreated = 1 on one benchmark run, and TotalITemToBeCreated = 1000000 on the second benchmark run, there's no valid way to compare scores between runs.

jtmueller commented 4 months ago

You might also consider abandoning Random.Shared and creating a class-level instance of Random with a fixed seed (any prime number is good). That way you get the same random values on every run, making it more meaningful to compare scores between runs.

Cimmeran commented 4 months ago

Am I missing something here?

You describe it as allowing you to take a random subset of an array, yet always seem to pass the array, and its full size as the values. Effectively a Shuffle with an extra parameter.

If I randomised an array 100 times, and compared the results I would expect some level of duplication. I would not expect 100 unique results.

Cimmeran commented 4 months ago

Ah, having had my coffee I get what you're saying now.

The documentation only mentions randomness, not uniqueness so it works as intended. Perhaps it should have an overload for unique? Or at that point it's just easier to use your suggested fallback.

Your proviso stands, and is solid advice.

But this feels like an abuse of the method. What is the likelihood of repetition if selecting 2, 5, 10 or 50 of 100, or more, elements? Or is this just for use cases that should not be unique?