Open utterances-bot opened 4 months ago
It's very strange to do a benchmark comparison while ignoring the fastest overload of Random.GetItems, and the only approach that lets you get a random set of items without any allocations.
I would say that the conclusion should be: Always prefer the void-returning overload of Random.GetItems, unless duplicates are a problem for your use-case.
Uhm. I'm afraid I don't know how to use that overload... Can you please give me the body of the benchmark method I should add?
I suppose you're talking about the void GetItems<T>(ReadOnlySpan<T> choices, Span<T> destination)
overload, but I don't know how to convert an array to ReadOnlySpan
and how to create a Span<T>
with a specific size, and then convert it to an array.
No problem - any array can be implicitly converted to a ReadOnlySpan
For example (untested code off the top of my head):
// Allocating a million-item array on every iteration means you're mostly
// benchmarking array creation, and not actually measuring the shuffling
// algorithm. Creating an appropriately-sized array once and filling it with shuffled
// items repeatedly is a better test.
private CustomRecord[] RandomItems;
[IterationSetup]
public void Setup()
{
var ids = Enumerable.Range(0, Size).ToArray();
Items = ids.Select(i => new CustomRecord(i, $"Name {i}")).ToArray();
Copy = [.. Items];
TotalItemsToBeRetrieved = Random.Shared.Next(Size);
RandomItems = new CustomRecord[TotalItemsToBeRetrieved];
}
[Benchmark]
public void WithRandomGetItemsSpan()
{
Random.Shared.GetItems(Items, RandomItems);
// note: when the output array is small enough to fit in the stack, you can do
// this to avoid allocating the output array on the heap...
// Span<T> randomItems = stackalloc T[itemCount];
// This would work for a couple-hundred items but not a million, so I resorted
// to the class-level output array.
}
It might also be worth making a record struct
version of CustomRecord to see how the performance differs between classes and structs.
Actually, I just noticed your Setup method uses [IterationSetup]. I'd prefer to use [GlobalSetup] here. I can't think of a reason to recreate the set of source items every iteration, when you [GlobalSetup] would create them once for each Size, and you'd spend less time waiting for your benchmarks to run.
I'd also be inclined to remove TotalItemsToBeRetrieved and just use Size in its place. If it randomly chooses TotalItemsToBeCreated = 1 on one benchmark run, and TotalITemToBeCreated = 1000000 on the second benchmark run, there's no valid way to compare scores between runs.
You might also consider abandoning Random.Shared and creating a class-level instance of Random with a fixed seed (any prime number is good). That way you get the same random values on every run, making it more meaningful to compare scores between runs.
Am I missing something here?
You describe it as allowing you to take a random subset of an array, yet always seem to pass the array, and its full size as the values. Effectively a Shuffle with an extra parameter.
If I randomised an array 100 times, and compared the results I would expect some level of duplication. I would not expect 100 unique results.
Ah, having had my coffee I get what you're saying now.
The documentation only mentions randomness, not uniqueness so it works as intended. Perhaps it should have an overload for unique? Or at that point it's just easier to use your suggested fallback.
Your proviso stands, and is solid advice.
But this feels like an abuse of the method. What is the likelihood of repetition if selecting 2, 5, 10 or 50 of 100, or more, elements? Or is this just for use cases that should not be unique?
Is Random.GetItems the best way to get random items in C# 12? | Code4IT
Code4IT - a blog for .NET enthusiasts, Azure lovers, and Backend developers
https://www.code4it.dev/blog/how-to-get-random-items/