dotnet / runtime

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
https://docs.microsoft.com/dotnet/core/
MIT License
14.91k stars 4.63k forks source link

Add a static Shuffle method to Array and an instance method to List<T> #13993

Closed Eirenarch closed 4 years ago

Eirenarch commented 9 years ago

Consider adding a static Shuffle method to Array and an instance method to List

Motivation

Shuffling data is often required in various applications. .NET does not contain a method to shuffle an array which results in many people implementing it themselves. Implementing a Shuffle method is not really hard but it does require like 10 lines of code and people often wonder where to put these so they just prefer quick and dirty solutions like abusing sort methods with random sort key or even worse random comparison. These methods are far from ideal both because they depend on implementation details of the sorting methods and because they are less effective than the ideal O(N) implementation.

Proposed API

Methods on Array:

public static void Shuffle<T>(T[] array)
public static void Shuffle<T>(T[] array, int index, int length)
public static void Shuffle<T>(T[] array, Random rng)
public static void Shuffle<T>(T[] array, int index, int length, Random rng)

Methods on List

public void Shuffle()
public void Shuffle(Random rng)

Details

The Shuffle method on Array has an overload that can be used to shuffle just part of the array. This is in line with existing Array methods like Sort and Copy and can be useful when implementing Shuffle on List where only the elements of the List will be shuffled without shuffling the full capacity of the list. I also suggest adding an instance Shuffle method on List since this is the most common usage of such a method.

sharwell commented 9 years ago

I'm not a particular fan of this for two reasons:

  1. "Shuffle" is not a well-defined operation.
  2. The proposed methods do not include overloads which allow the user to specify their own "Shuffle" behavior. This differs from other algorithm-type operations which leverage IComparer<T> and IEqualityComparer<T> to provide additional functionality to the user.

I see this as an application-specific operation which is far less commonly used than the other operations on Array and List<T>.

terrajobst commented 9 years ago

@Eirenarch, can you explain what shuffle is supposed to do? I assume you want to randomly sort the contents of the array, i.e. shuffling it up?

Eirenarch commented 9 years ago

In my opinion Shuffle is quite well defined as "uniformly randomize the array". There is no comparison in this case but it may make sense to add an overload accepting a Random object.

Eirenarch commented 9 years ago

More precisely "randomly permute the array"

AdamSpeight2008 commented 9 years ago

Easily implementable via Extension method. Eg Exts.Random

Eirenarch commented 9 years ago

@AdamSpeight2008 it is certainly true but as I said people prefer to simply use OrderBy or Sort than implement it or take a dependency on a package.

AdamSpeight2008 commented 9 years ago

@Eirenarch Eg. shuffle = theArray.OrderBy(Function(x) Guid.NewGuid() ).ToArray or

private System.Random rnd = new System.Random();

public static T[] Shuffle<T>( this IEnumerable<T> xs ) 
{
  return xs.OrderBy( x => rnd.NextDouble(); );
}

If it that easy to implement and also independent of type. Why should Array provide one?

pdelvo commented 9 years ago

@AdamSpeight2008 This does not work. Have a look at this: http://bost.ocks.org/mike/algorithms/#shuffling

AdamSpeight2008 commented 9 years ago

@pdelvo I know that it is biased. That wasn't my point.

Eirenarch commented 9 years ago

@AdamSpeight2008 no one wraps it in extension methods. People just do it when they need to shuffle. No one goes looking for a good place to put this extension method.

jakesays-old commented 9 years ago

I think 'no one' is a little too broad. This is definitely something I would add to my application extension class if I needed to shuffle an array. Not something that should be added to the framework.

Eirenarch commented 9 years ago

I stand corrected. A lot of people do, but as evident by upvotes on various related questions on SO like this one http://stackoverflow.com/questions/108819/best-way-to-randomize-a-string-array-with-net and this one http://stackoverflow.com/questions/273313/randomize-a-listt-in-c-sharp a lot of people are OK with OrderBy based solution. I have used it too and I knew fully well that it is far from ideal. I don't think the fact that it is easy to add such a method precludes it from being added to the standard library. After all String.IsNullOrEmpty is even easier but it is still included.

I know it is not an argument in itself but I'd also like to note many languages choose to include a shuffle function in their standard libraries - C++, Java, Python. Honestly I cannot see a downside.

AlexArchive commented 9 years ago

Granted, this isn't absolutely pressing but I do think it would be a nice addition to the Api.

Personally I have written code like this on more than one occasion (reference):

Random rnd = new Random();
string[] MyRandomArray = MyArray.OrderBy(x => rnd.Next()).ToArray();

I would much prefer to leverage an Api like the one described here.

AArnott commented 9 years ago

I'm not sure this will meet the bar of broad applicability sufficient for inclusion into the core framework. I have written many apps and even more libraries, and I have never had the need for such a method as this. Specific types of apps such as media I could see it be quite popular. But I don't think that's sufficient to put it into the core library.

On Sun, Jan 18, 2015, 2:39 AM ByteBlast notifications@github.com wrote:

This isn't absolutely pressing but I do think it would be a nice addition to the Api.

— Reply to this email directly or view it on GitHub https://github.com/dotnet/corefx/issues/461#issuecomment-70403314.

AlexArchive commented 9 years ago

@AArnott Sincerely, what is the downside?

davkean commented 9 years ago

Even with media applications, this wouldn't be applicable. If you look at what modern media players do when they shuffle, is they weight the shuffle by number of plays, so that content that you play more often, is more likely to play again.

Eirenarch commented 9 years ago

Media apps are not a good example. I would assume that they do not shuffle an array and play the songs in the array but rather choose a new song at random from the list every time. More likely applications include various games (like card games) and cases where something should be displayed differently every time like answers to questions in a test or the Windows Browser Choice screen. I do think there are a good number of applications.

AArnott commented 9 years ago

All the more reason then that it shouldn't go into the core framework. When Shuffle is defined in different ways to different apps, it further reduces the applicability of the method. What is the downside? The downside is that it's another method, and methods aren't free. Every API has a cost and must pay for itself. An API can usually pay for itself both because it's very useful and because it is broadly applicable (it multiplies its usefulness when you have both). But when you have just one of those dimensions (no matter how useful, if it isn't applicable to a great number of apps or libraries) it's not nearly as useful as other APIs you find in corefx and thus often shouldn't be added.

And there is a fully supported story for these useful but less applicable methods: other libraries.

jakesays-old commented 9 years ago

@Eirenarch I would argue that string.IsNullOrEmpty() is probably one of the most useful and used extensions around. Not sure it helps your argument much. And @AArnott is correct - every API does indeed have a cost, no matter how simple.

Eirenarch commented 9 years ago

Well, I do believe that the cost (at least for the static versions) is small enough and the method is indeed useful enough (not String.IsNullOrEmpty useful but still useful) that it is worth the cost.

colombod commented 9 years ago

Code will always have impact in load time and workspace size. To shuffle is a very personal choice when it comes to the definition of "random position" so whatever is offered will very likely never fulfil the need of the user. I don't see much relevance is a shuffle function on collections.

Fat classes with unused methods are just small devices and speed unfriendly.

If you had a class library with cool random generators that I could argue that such a library could also offer utilities to shuffle collections according to statistic distributions.

terrajobst commented 9 years ago

I fully agree with the sentiment that @AArnott stated: an API starts with negative points.

How negative depends on how broadly used and complex the API already is. Adding another API to a WPF class is less of an issue than adding another API to a very common and broadly used type like Array. In particular, we need to be careful that we don't turn core types (such as Array and String) into a grab bag of potentially useful APIs. There are all sorts of considerations from code size, to complexity as to the exposed concept count in IntelliSense. On that note, it's already considered a problem that all Linq extensions show up on String.

As a rule of thumb, the more utilitarian an API is the more usage is required to justify it's existence (see this related API review discussion). A shuffle method seems to be quite specialized with at best one in a thousand apps using it. Truthfully, I don't think that makes the bar for Array.

What do others think? @KrzysztofCwalina @stephentoub?

stephentoub commented 9 years ago

I've had a need here and there for such functionality in recent memory (e.g. in a Sudoku generator determining the order in which cells should attempt to be removed from a solved board, in a photomosaic app determining the order in which regions of tiles should be greedily matched, etc.), and the few times I've needed it, I've just written the five or six lines of code to accomplish it (half of which was a swap).

I don't think it's something that should be added to Array or List; it's too specialized and policy-specific for such core, broadly-applicable, and largely policy-free types. I'm not against having such functionality somewhere in the libraries, but not as part of those types. If, for example, there were a numerical library with a focus on randomization, probabilities, statistics, etc., it could very well make sense there (though if all I needed from the library was this one function, I'd probably still copy-and-paste the code in or re-write the five or six lines of code, rather than take a dependency on another library).

If we wanted to do something to help this case, I'd prefer to see us add a super simple Swap method, which would save half of the lines necessary for this, bringing it down to something like three lines, and also helping to eliminate boilerplate in lots of other code.

Eirenarch commented 9 years ago

The arguments against adding the method in Array do make sense, maybe Array is not the right place to put these methods. Is there any other place that is still in Core where such method can go. Some place like the Enumerable class maybe?

AdamSpeight2008 commented 9 years ago

@Eirenarch Wrapping them as extension methods make them simple to use with LINQ

 // Pick 4 differently positioned items out of the list.
Enumerable.Range(1,100).RandomPick(4).ToArray;
 // Choice 4 items out of the list, same positioned item allowed..
Enumerable.Range(1,100).RandomTake(4).ToArray();  

SwapWith<T>

LeftHand.Contents.SwapWith( RightHand.Contents );

Also should there be two methods, one that mutates the source, and one that doesn't?

Source Mutating Sub Shuffled(Of T)(ByRef source As ICollection(Of T)) Source Non-Mutating ICollection Shuffled<T>(ICollection source)

var deck = new DeckOfCards()
deck.Shuffled();

scrabblebag = scrabbleBag.Shuffle();
tileRack = tileRack.Shuffle();

How would you shuffle an Immutable Collection?

Eirenarch commented 9 years ago

RandomTake is actually a cool idea. Of course if the method is put in Enumerable then it will not mutate the collection. On the other hand I was not literally suggesting Enumerable. I was thinking of a similar class that could be used. For example Java has a static Collections class ( http://docs.oracle.com/javase/7/docs/api/java/util/Collections.html ) that contains this method. There is no equivalent in .NET but maybe some kind of utility class can accommodate this method.

AdamSpeight2008 commented 9 years ago

I already have see nuget Exts.Random. To get access to it all you need is. using Exts.Random;

Eirenarch commented 9 years ago

I understand. My point is that people would just go for the quick and dirty solution OrderBy(random) rather than get a NuGet package or writing a method. Empirical evidence points that this is what happens in practice although theoretically people should either implement it or get it from somewhere.

AdamSpeight2008 commented 9 years ago

You'll generally only have to write it once, in a library and import it when you need it.

colombod commented 9 years ago

I suggest to have that method as extension instead of static and have also support of random generators or other facilities to drive the shuffling. This way it could also be used in scenarios that requires specific behaviours like weighted shuffling like has been pointed out before. What do you think?

On 19 Jan 2015, at 22:20, Adam Speight notifications@github.com wrote:

You'll generally only have to write it once, in a library and import it when you need it.

— Reply to this email directly or view it on GitHub.

xanather commented 9 years ago

A good example of such usage would be a deck of cards. Another example: Ordering a list/array of connected clients to determine who plays first in a turn based game.

I think this could be a good addition, undefined randomness has many applications. On the other hand I agree with @terrajobst about having a "grab bag of potentially useful APIs" on core classes.

Maybe there should be a separate project/library that is full of "commonly" used helper methods for the core classes and let developers decided if they want that dependency or not - this is probably something outside the scope of this .net corefx repository and/or the .net foundation in general... Maybe once .net core is more established an "official" package could be developed for this but I have a feeling such things will be left for third-party libraries.

MgSam commented 9 years ago

I also don't think this is general enough to warrant inclusion in the core framework. That being said, why would you want this to be on Array? It seems to me the natural place for this is as an extension method to IList (in other words, any collection with an indexer, which is what IList is broadly used to denote).

Eirenarch commented 9 years ago

@MgSam I don't want it on Array. Implementation on IList would be ideal but when thinking about where this method can go I did not think of any place that is better than array. As far as I know there is no standard place to put extension methods on IList or on Collections other than IEnumerable.

MgSam commented 9 years ago

@Eirenarch Seems like that's a bigger issue that could be addressed. A System.Collections.Generic.Extensions might be useful. I personally have a bunch of extensions on IList and ICollection.

Eirenarch commented 9 years ago

@MgSam Yes, it would make sense. The problem with small methods is that people simply are not willing to write them or take a dependency on the package for just one method they need so they prefer the quick and ugly solution. In my opinion these types of methods either go in the core or they end up unused. For example I did not write my own String.IsNullOrWhitespace and I was using ugly ifs before it was added to the framework. Now I use it all the time.

AdamSpeight2008 commented 9 years ago

@xanather If your user case is cards then look at Cards. That has the concept of a "Shuffler", which can be used to implement different types of shuffling.

colombod commented 9 years ago

I truly prefer the idea of shuffler to the shuffle method.

nvivo commented 9 years ago

This can be an extension method to IEnumerable<T>.

Here is what I'm using, based on Jon Skeet's post here: http://stackoverflow.com/a/1287572/1902354

private static Lazy<Random> _defaultRng = new Lazy<Random>(() => new Random());

public static IEnumerable<T> Shuffle<T>(this IEnumerable<T> enumerable, Random rng = null)
{
    if (!enumerable.Any())
        yield break;

    var list = enumerable.ToList();
    rng = rng ?? _defaultRng.Value;

    for (int i = list.Count - 1; i >= 0; i--)
    {
        int next = rng.Next(i + 1);
        yield return list[next];
        list[next] = list[i];
    }
}

It would be nice to have this integrated into .NET.

nvivo commented 9 years ago

Just wanted to say I'm with @Eirenarch on this. This should be integrated into .NET to be useful. A nuget package for this won't do it.

Contraty to @MgSam, I think this general enough to be included into core, but as an extension to Enumerable instead of touching Array and List.

cdrnet commented 9 years ago

@nvivo this method should probably not allow a null random instance as it is not thread-safe in this case (and will fail in nasty ways hard to track down). I would certainly expect such a Shuffle extension method to be thread-safe.

Eirenarch commented 9 years ago

@nvivo I kind of dislike this method since it forces the creation of a new list. If we have a method that can shuffle an array or a list we could easily use it on an IEnumerable by forcing the enumerable to become an array or a list (which this method is doing). However if we have this method on an IEnumerable we can't easily create an in-place version from it.

nvivo commented 9 years ago

@cdrnet,

not sure what you mean with thread safety problems by passing a null to the method, but this is an example. A simple lock would solve concurrency problems with the static Random. The example just shows this can be done with enumerable in a more generic way without the overhead of OrderBy(() => random.Next()).

@Eirenarch,

I do think that .NET should have a "good way" to achieve this, and this should be included in the BCL, but "good way" for me means "what would work for most people for most cases", not what would be the fastest way possible. In this case, IEnumerable extensions beat Array or IList implementations.

But I see no reason why another similar method couldn't also be implemented in IList too.

static void ShuffleInPlace(this IList<T> list) {... }

My point is: if this is implemented, this should be in a generic interface as an extension method, so it works on as many types as possible

Also, there is no significant performance gain in adding this to Array itself. the enumerable extension is 2O(n). It's really not that different from O(n). I'd argue that if you are doing something where enumerable.ToList() is a bottleneck, you are not going to trust a generic shuffling algorithm anyway.

AdamSpeight2008 commented 9 years ago

Array.Sort also has an issue as you can't specify how the swapping is to occur. What if synchronisation is required? How do you specify which lock to take? etc. Since at of example have been provide of how relatively simple it is to implement I'm of the opinion it should be an additional non-core library method. There are existing nuget package that implements it. Thus doesn't need to Microsoft implement one.

nvivo commented 9 years ago

@AdamSpeight2008,

By your logic, .NET shouldn't have implemented String.IsNullOrWhitespace, because it is too easy to implement. Also, by your logic nothing will ever be implemented again, as everything can be implemented as a non-core nuget package. We already have C for that.

AdamSpeight2008 commented 9 years ago

@nvivo String.IsNullOrWhitespace was likely a commonly user implemented pattern, especially since it very likely being used to against a data input WinForm (maybe a double digit amountof use). So it was made easier to implement and thus add to String.

I haven't needed to shuffle the contents of collection that often. @nvivo How often have you need to do that?

I prefer to import a number of small more specific usage DLLs rather that one general monolithic DDLs of which I use a small percentage of. More and more project are taking this approach. If can be put in separate Library then it more than likely it should be. The core should be focused and the have the minimal functionality required. If other things can be built on top of it, even better.

nvivo commented 9 years ago

@AdamSpeight2008,

I use shuffle quite a lot actually. Almost every business app ever did requires some shuffling. Most cases boil down to:

The thing here is that I don't think shuffling a list is such a rare thing to do. I think it is even more common than some methods like "Intersect" that are already implemented.

Still think that by your definition, nothing will ever be implemented in corefx again, as at this point, since we already have ints, strings and arrays, literally anything can be considered superfluous.

Alexx999 commented 9 years ago

@nvivo while that's true - it's sometimes really needed. But it's just "sometimes" - once or twice per whole product. For example, personally I use String.IsNullOrEmpty quite often, and while it can be easily implemented this method leads to better coding practices (one method call instead of bunch of checks) and IsNullOrWhitespace probably allows for better performance (String.Trim leads to memory allocation for new string, while IsNullOrWhitespace doesn't need to, and writing for-loop to check if string is whitespace is impractical).

nvivo commented 9 years ago

@Alexx999, the thing is that we cannot reach consensus for this. People have different needs. I'm pretty sure if you just ask 10 people at random what is the single most important thing that should be added to corefx right now, you'd be lucky to get 3 people agreeing on the same thing. But it doesn't really matter.

List shuffling is not important because I use it, it is important because it is a basic manipulation of a core structure. It's implemented almost in every standard library out there, including Java, Ruby, Python, even C++.

It seems the issue was assigned to someone, so I trust they will analyze and see a good way to implement this.

stephentoub commented 9 years ago

It seems the issue was assigned to someone, so I trust they will analyze and see a good way to implement this.

Just a point of clarification on process: just because the issue was assigned to someone doesn't mean it's been accepted. We're now striving to have every issue quickly assigned to someone who will be on point to make sure it's dealt with appropriately... that could mean fixing it, that could mean finding someone else to fix it, that could mean working with the team to decide to close it, etc.

AdamSpeight2008 commented 9 years ago

Since there can be no consensus about what it random, swapping, shuffling, etc. Let's go to the extreme and leave most of the decisions to the programmer.

interface ISourceOfRandomness
{
  .GetValue() : Integer
 }

interface ISwapper<T>
{
  .Swap<T>(ref out x : T, ref out y : T)
}

interface IShuffler<T>
{
  .Shuffle<T>( items        : ICollection<T>,
               randomSource : ISourceOfRandomness,
               swapper      : ISwapper<T>
              ) : ICollection<T>
}

"Core" implementation (sorta)

public static Shuffle<T>
  ( items        : ICollection<T>,
    shuffler     : IShuffler<T>,
    randomSource : ISourceOfRandomness,
    swapper      : ISwapper<T>
  ) : ICollection<T>
{
  if( items == null ) throw exception;
  if( shuffler == null ) throw exception;
  if( randomsource == null ) throw exception;
  if( swapper == null ) throw exception;
  return shuffler<T>( items, randomsource, swapper );
}

  .