dotnet / roslynator

Roslynator is a set of code analysis tools for C#, powered by Roslyn.
https://josefpihrt.github.io/docs/roslynator
Other
3.07k stars 256 forks source link

New analyzer + codefix: Remove no-op Select statements #323

Open IanKemp opened 6 years ago

IanKemp commented 6 years ago

Sadly in code I often see things like this:

var y = collection.Select(x => x);

This does nothing except waste CPU cycles. An analyser and refactoring to remove these aberrations would be nice.

LesRamer commented 6 years ago

There is one really meaningful use for Select(x => x): protecting the identity of a collection.

Consider this example:

public class Foo
{
    // other meaningful code omitted...
    private List<int> collection = new List<int>();

    // If proposed analyzer/fix changes this...
    // public IEnumerable<int> Bar => collection.Select(x => x); 

    // ...into this:
    public IEnumerable<int> Bar => collection;
}

Doing so would enable callers to break encapsulation by doing something like:

var f = new Foo();
((List<int>)f.Bar).Clear(); // f.Bar is now cleared without f's knowledge, which violates encapsulation

With Select(x=>) in place, such an abuse would end in a cast-exception at run time. It could be more innocent than this: calling a method that takes an IEnumerable<> and that more innocently expects to be passed a Collection<> of some kind could cast and change it.

The Select(x => x) provides protection against this kind of thing without making a copy of the collection -- which could result in huge savings over creating and returning a defensive copy of the collection. There are certainly other approaches like returning a ReadOnlyCollection type that wraps the List<> type. I'm not convinced that it's necessarily better in the general sense. I've inherited code at work that is very List-happy -- it regularly returns List<> and passes List<> to and fro. So the sort of abuse I described here happens all the time in our legacy code. I'm generally afraid to change these sort of side-effects since other code may depend on them. My first approach is to change parameter and return types to IEnumerable<> and go from there, but I would feel justified in returning or passing Select(x => x) in these cases as a next step in protecting object integrity.

I don't recall whether I first read about Select(x=>x) from Eric Lippert (here) or Jon Skeet (here). Probably Lippert's post since it is older.

Regardless, this is something I felt the need to urge caution on. It's sometimes a good idea and other times it may weaken encapsulation.

IanKemp commented 6 years ago

@LesRamer I would argue that if you are using Select solely to prevent your callers from "back-casting" to the original source collection type, because they want to do naughty things like mutate it, then the problem is with said callers. And if your code is internal to your company, then presumably you have full control over who's calling it, so you can hunt them down and educate them with a cluebat.

But I agree completely that, from an encapsulation viewpoint (especially if you're building something for public consumption), an Select of this form is acceptable. Personally I would go with specifying a return type of IEnumerable and forcing the collection to be immutable via List.AsReadOnly() or similar, as IMO that is unambiguous as opposed to a "no-op" select.

If Josef does decide this is an analyzer/codefix worth adding, it would probably be best if he were to ship it as disabled by default, and users like me who want this functionality can enable it.

LesRamer commented 6 years ago

My tone wasn't intended to come across as discouraging this -- to the contrary, I think it's worthwhile to detect and eliminate effective-no-op code. I merely felt compelled to offer the one use-case for an "identity-select" that I was aware-of where it would make sense to retain it.

I view applications of significant size as consisting of mini-frameworks that have similar needs that framework providers do. Small-to-medium size applications often get away with a different set of design rules than solid framework providers require.