Closed maxkoshevoi closed 3 years ago
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.
This issue is to ask approval to set RuleLevel for that analyzer as BuildWarning.
If I changed your highlighted example to:
List<object> x = new List<object>() { "hello" };
foreach (string s in x) { ... }
is that going to warn? If yes, I don't think we should have this analyzer at all, nevermind what its default warning level is. It's going to be incredibly noisy, I fear. If we do add it, it should at most be info by default. The statement "If explicit cast is intended, user should do it explicitly, so that intent is clear" highlights that this is really a rule about style. cc: @mavasani
is that going to warn?
It will be reported for your example. Default (and current) RuleLevel
is IdeSuggestion
.
The statement "If explicit cast is intended, user should do it explicitly, so that intent is clear" highlights that this is really a rule about style
That's fair. To be honest, If it would be possible to not report explicit conversion that would succeed in runtime, I would've implemented that analyzer so it doesn't report them. But only runtime can tell that (we can receive that collection from some third-party code, for example).
Main goal of this analyzer is to warn about InvalidCastException
in runtime. So successful casts can be considered as some kind of false positive. I agree with your point, that since it reports them, it shouldn't be a warning.
What do you think about leaving it as suggestion?
It's going to be incredibly noisy, I fear
I thought, usually foreach
iterator variable has following types:
var
For all of those cases this analyzer wouldn't be reported since implicit cast is used there.
This analyzer will only be reported if foreach
will attempt to cast
I don't think those cases are common (at least I didn't encounter them)
Also, this unobvious explicit cast is already forbidden for tuples:
I don't think those cases are common (at least I didn't encounter them)
They're very common, especially when working with non-generic collection types. I tried the rule from your PR on dotnet/runtime, and it yielded 1199 warnings; I sampled them, and none of the ones I looked at were bugs.
I don't think we should be adding this rule, or if others really think it's impactful, it should default to a level of None.
especially when working with non-generic collection types
Thanks for the insight! Want's aware that non-generic collections are still so widely used.
What if we limit the scope of this analyzer to only types that implement IEnumerable<T>
?
I tend to agree with @stephentoub. Given the high false positive rate, this rule if approved, should most likely be a disabled by default rule OR silent/hidden so it’s a refactoring.
What if we limit the scope of this analyzer to only types that implement IEnumerable
?
Won't the existing compiler CS0030 error catch the majority of such misuse? e.g. https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA+ABADAAgwRgDoBhCAGzJjABcBLCAOwGcBuAWAChs98V2POGAMx4ATDmI4A3pxxy8IjChwBZABQBKabPm6AZtBgBDMAAsca2g2o5athjgYwA7jgBiECJo07dcmRx+8gC+vjihHBGC4h4QOCA8QgA8BFgAfJwBugCSAKIMAK4AtjBQRtTQKfjpiVXphADiMNT5xaXl0JphWUE41KZQEK5OrgSEAHIQ1NlFAA6UJdYwACa5CGAws3SMmvy6UYHyeYUlZRVQiY3NracdUF2H/mG6/YPDLjwTUzPzMIvUKzWGy29AYuzCEWCQA==
It wouldn't for types that implement IEnumerable<object>
, but in that case, I'm not sure what this rule would be hoping to achieve, anyway: it's very unlikely such a collection would literally be instantiations of the base object type, so the developer is stating as part of their foreach loop what type they're expecting... the foreach is their way of writing the required cast.
Won't the existing compiler CS0030 error catch the majority of such misuse?
This analyzer reports only is cast doesn't exist (even explicit one).
Proposed analyzer is targeted at cases where foreach
will try to cast base class to it child (explicit cast exists, so CS0030
wouldn't be reported).
It's not only about object
collections. Here are two examples:
var x = new List<IComparable>() { 5 };
foreach (string s in x)
{
}
var x = new List<A>() { new() };
foreach (B item in x)
{
}
class A { }
class B : A { }
Side note: We can even exclude object
collection from this analyzer as well. So it will only look at types that implement IEnumerabe<T>
where T
is not object. That should greatly reduce number of cases where analyzer interferes with developers intent.
Proposed analyzer is targeted at cases where foreach will try to cast base class to it child
Right. I'm saying that such use is the minority, and when it does exist, the developer is using the foreach as a way to write the cast. In your example, if someone has:
var x = new List<IComparable>() { "hello" };
foreach (string s in x)
{
}
your rule is going to warn, but this is perfectly valid, and either they're going to be forced to suppress the rule or change their code to the less concise / less readable:
var x = new List<IComparable>() { "hello" };
foreach (IComparable i in x)
{
string s = (string)i;
}
I don't see the benefit.
I'm saying that such use is the minority
Intentional use. Such code might be written by mistake. I come up with the idea for analyzer when I refactored a collection type to its base type in one place, and code crashed in runtime on a foreach loop that used that collection. I forgot to change variable type there and VS didn't notify me that I forgot that.
I don't see the benefit.
My thinking was that explicit cast needs to be done explicitly. If I don't want to use explicit cast, and using foreach
as a regular loop, VS should notify me somehow (warning or suggestion) that explicit cast will be performed here in runtime.
Such code might be written by mistake.
So, why not write a rule that flags every explicit cast from a base type to a derived type? Maybe the developer made a mistake.
That's essentially what this rule is proposing doing. (And obviously we're not going to add such a rule :)
flags every explicit cast from a base type to a derived type
Is .Net compiler generates such cast somewhere else except when compiling foreach
?
I mean, if user has written cast manually (B)a
, it's obviously doesn't need to be flagged. User clearly expressed an intent to perform an explicit cast. When using foreach
you don't express this intent.
Usually .Net asks to perform cast explicitly when user tries to assign variable of base type to a derived type.
B b = a
Or even here:
I mean, if user has written such manually (B)a, it's obviously doesn't need to be flagged
I don't see the difference. The developer is saying "given this collection, I would like to enumerate every element in it, referring to each as an xyz". Whether they write it as foreach (xyz item in collection) { ... }
or foreach (object o in collection) { xyz item = (xyz)o; }
, they're saying the same thing. I think your objection is to this behavior of foreach
, but that's how foreach
has always worked. And I don't see how the latter form is any more explicit or intentional than the former form. You described refactoring a loop and accidentally not changing the foreach and getting an exception at runtime; such an issue is no less likely to happen if you'd written it with an explicit cast.
such an issue is no less likely to happen if you'd written it with an explicit cast
Exactly. But I didn't want to do any casting, I just wanted to enumerate the elements. Variable type was there only for code readability (so I don't need to hover over var
to see what type it is).
If I'd done the same refactoring, but got a runtime exception on a cast, I wouldn't be surprised. Explicit cast is explicit exactly because is can fail.
given this collection, I would like to enumerate every element in it, referring to each as an xyz
Yes, that was my thinking too. I thought that this:
var x = new List<IComparable>();
foreach (string s in x)
{
}
Is equivalent to this:
var x = new List<IComparable>();
for (int i = 0; i < x.Count; i++)
{
string s = x[i];
}
But compiler will report a CS0266
for second code snippet, but not for the first. I understand that this was done because foreach
was created when generics didn't exists, and it would be tedious to cast from object manually every time, but now we have generics.
I'm trying to make it behave the same for both cases.
We can even exclude all object
collections from this analyzer as well. So it will only look at types that implement IEnumerabe<T>
where T
is not object
.
I understand that this was done because foreach was created when generics didn't exists, and it would be tedious to cast from object manually every time, but now we have generics.
I don't believe analyzers we ship in the SDK are a place to retcon behaviors of the C# language.
@stephentoub as you said, this rule shouldn't be a warning (or an error), and compiler doesn't generate suggestions, only analyzer does (if I'm not mistaken).
It would be a suggestion that notifies about existing C# behavior, that might not be obvious (like CA2246 does, for example).
No one will change this behavior of foreach since it would break a lot of code bases, but it least non-breaking analyzer for it would be nice.
If there is case where we can know that something is always a runtime error then I agree that it is an excellent case for a potential analyzer. Unfortunately, I don't see how this analyzer can ever be definitive in its recommendations.
As @stephentoub says the compiler already warns for cases that we know will cause runtime failures:
using System;
using System.Collections.Generic;
public class C {
public void M() {
var list = new List<string>() { "Hello World" };
foreach (int i in list) // error CS0030: Cannot convert type 'string' to 'int'
{
Console.WriteLine(i);
}
}
}
My understanding of what is being proposed is that we now warn for this case as well:
using System;
using System.Collections.Generic;
public class C {
public void M() {
var list = new List<object>() { "Hello World" };
foreach (int i in list) // New warning about casting object to int
{
Console.WriteLine(i);
}
}
}
I can understand the implicit cast in foreach
can be surprising but that's how the language works. Whenever you see foreach (int value in list)
you should think for (int i = 0, value = 0; i < list.Count; value=(int)list[i], i++)
. If you do not want that to happen you can use var
which will type the value as whatever it actually is instead of attempting to cast it.
You could require this pattern in your codebase so all casts are explicit:
using System;
using System.Collections.Generic;
public class C {
public void M() {
var list = new List<object>() { "Hello World" };
foreach (var value in list)
{
int i = (int)value;
}
}
}
Or this:
using System;
using System.Collections.Generic;
using System.Linq;
public class C {
public void M() {
var list = new List<object>() { "Hello World" };
foreach (var i in list.Cast<int>())
{
}
}
}
But this really becomes a style argument about what the best way to express to readers that you intend the cast that is happening.
Is there a guide somewhere to the philosophy of analyzers? I can see correctness-style analyzers falling under three general categories.
My opinion (and just my opinion) is that these are also in increasing order of importance for an analyzer to flag. [3] is the most important because it's alerting you to a problem that you might not even know about until your prod server falls over unexpectedly. [1] and [2] are less important. We should flag them if we can, but if we miss something it's not the end of the world, as the developer will immediately see any runtime failure as soon as they exercise this code path.
For [2], I'd rather err on the side of false negatives instead of false positives. Other posters in this thread described their concerns behind a false positive rate. On the other hand, the worst consequence of a false negative is that you see the failure the first time you run your app, and you spend a minute or two debugging it. That doesn't seem all that bad, to be honest.
* I'm using "always" a bit liberally. That is, if your test cases all populate the List<object>
with strings, and you assume this is representative of a typical scenario, then you "always" go down the success path for typical inputs. And if you happen to receive an atypical input, your implicit cast acts as a quasi Debug.Assert("I really expected this to be a string and haven't tested this code for other data types.")
. Score one for invariant checking! :)
You could require this pattern in your codebase so all casts are explicit But this really becomes a style argument about what the best way to express to readers that you intend the cast that is happening.
@jmarolf Hm, great point, I'm actually more concerned about style here, then whether code will fail it runtime or not. My point is that all explicit casts (from parent to child, or from interface to class that implements it) should be explicitly written. This would serve two goals:
foreach
variable type and indeed wants to perform such castforeach
always does cast under the hood, but it's used so frequently to just iterate collections that when reading the code I don't assume that any foreach is a potential runtime error. Right now It requires some research to determine if some particular foreach performs explicit cast or not.I'm not proposing that all foreach
statements look like this from now on:
foreach (var item in list.Cast<T>())
{
}
Only once that will actually perform explicit cast (from parent to child, or from interface to class that implements it) in runtime.
@GrabYourPitchforks This analyzer falls into 2 and 3 category (we don't know where collection came from, and how it's values are populated), but mostly 2, I think.
Regarding false positives, @stephentoub helped identify two groups so far:
object
collections (like ArrayList
). A lot of code is already written for them and it expects foreach
to cast them to some type, so there shouldn't be any surprise that it performs explicit cast in their case. I excluded such collections from this analyzer, so this is no longer a false positive.
I'm not sure if generic collections with object
as type argument (List<object>
) should be included in this bucket though. Would like to hear any thoughts on this.Collections that always return one type. For example:
foreach (string item in GenerateSequence())
{
}
IEnumerable<IComparable> GenerateSequence()
{
}
Here's an example from dotnet/runtime
. Here explicit cast is performed since Instantiation
returns TypeDesc
and GenericParameterDesc
is derived from TypeDesc
. Should it be flagged with this analyzer to manually perform the cast? Also would like to hear any thoughts on this. Here is another example (schema.Includes
is XmlSchemaObject
)
As for other false positives, I didn't find any so far (at least in dotnet/runtime
nothing is flagged if I exclude everything mentioned above). I'll try to scan other repositories to see if there are some other cases that potentially need to be marked as false positive.
I do appreciate all the effort here, but let's not add this to roslyn-analyzers. The concern here is about style, and it would be better suited to an analyzer suite focused on style, e.g. stylecop.
If the concern is about style, it would be better suited to an analyzer suite focused on style, e.g. stylecop.
This style safeguards users from potential runtime issues (like CA2245 or CA2248, for example), it's not only about code readability. Can we include it as suggestion if all mentioned false positives are excluded?
I would prefer this rule to be in stylecop as it fits more into style.
Besides it warns when people use objects, now there is a chance that for cases like ISomeInterfaceHere
upcasting to a class that implements said interface might not be a bug at all.
However it would be nice if foreach
did have a way to pattern match instead of casting (like from ISomeInterfaceHere
to MyClassIneheritingFromSomeInterfaceHere
) instead of the cast.
However there is another issue:
And yes the only places I can see foreach pattern matching of any use are:
However I do think pattern matching foreach would be great to add to avoid implicit casting inside of them (and makes the intent more clearer if one did not want to use var
).
However in my code even on my own Plugin
Loaders, using AppDomain
or not, uses var
on everything just because I prefer var
for everything unless I explicitly cannot use var
for something.
Also another note one can also do this if upcasting is intended as well too:
foreach (var something in somethings)
{
// use pattern matching to determine the real types of the type of 'something'
// (Interface or whatever) and do things based on whatever the type really is.
}
safeguards users from potential runtime issues
I don't believe it does.
safeguards users from potential runtime issues
This same argument could be made for warning about all explicit casts. The developer may be new to C# and be unaware that the cast will cause runtime errors. My understanding of the problem as it happened is:
start with code like this
List<Student> People {get;}
// some levels of object-oriented layering later...
foreach(Student student in People)
{
...
}
We then add a base type like say Person
that Student
inherits from and update our collection to contain the base type
List<Person> People {get;}
// some levels of object-oriented layering later...
foreach(Student student in People)
{
...
}
the compiler doesn't warn because this cast could succeed. There may even be paths were nothing happens and ones where there is a runtime exception. While I am sympathetic, I do not see a universal way to address this problem today. If I am an analyzer, the developer needs to communicate to me what their intent is. There are millions of lines of code that do this today and I shouldn't warn them just because there exists some situations where this could be non-intentional.
If, in my codebase, this sort of refactoring is happening all the time I would personally use var
in a lot of places. I might even recommend var
be used everywhere. That was one of the reasons var
became so common in the IDE codebase of roslyn, because we kept refactoring different layers and we didn't want to deal with the fallout of updating thousands of variable declaration locations.
However, these sorts of refactors never happen at the compiler layer. For that reason the roslyn compiler layer does not use var
at all.
This is the reason the style rules for https://github.com/dotnet/roslyn are what they are and why I think this analysis is best expressed in something like stylecop.
@stephentoub @jmarolf @AraHaan @GrabYourPitchforks Thank you all for great discussion. I'll redirect this analyze to stylocop.
Thanks.
Is there a guide somewhere to the philosophy of analyzers? I can see correctness-style analyzers falling under three general categories.
- The code you've written is going to fail at runtime. Don't even bother hitting F5. (See existing error code CS0030.)
- The code you've written may succeed at runtime or it mail fail at runtime. Practically speaking, it will always* go down the "success" path or it will always go down the "failure" path, but we can't really tell from static analysis which world we're in. If you hit F5 you'll find out quickly which side you're on.
- The code you've written may succeed or fail at runtime, but it will do so non-deterministically, or the variables which affect success vs. failure may be environment-dependent and very difficult to reason about. Your production server might observe different behaviors than your test server.
Interestingly I do see things a little differently. For cases like [1] I am annoyed as a developer if the tooling could have told me something was wrong sooner but "wasted my time" by assuming I would figure it out eventually. In general, false positives are extremely detrimental to analysis being successful. If warnings are only sometimes correct it's just another cognitive load on the developer. Now in addition to whatever they have on their plate they need to take the time to reason about all these diagnostics and decide if they are false positives. The urge to just silence all warnings goes up for every false positive that it encountered. I see these as being in priority order:
async void
delegates to something that takes an Action
). roslyn#12649 is the most extreme example I can think of. These are the obvious slam dunks and should be on by default. Some of these will even come directly from the compiler instead of analyzers.MemoryManager<T>
type. Applying these principled to the rule proposed in this issue:
foreach
loops will contain actual bugs. By this metric I would expect every single warning that appears in the dotnet/runtime repo to be a false positive. Again, because this issue is likely to appear in newly written or newly changed code and foreach loops are used a lot.var
usage in foreach
is about 50/50foreach (var
is used ~150K on github so lets say ~150K don't use var
in foreach
. Thats a lot of code that is has been running for years without issue and would potentially be flagged. Because this has a potential for a high false positive rate in a large number of codebases I think its a non-starter.
I'm creating an analyzer (https://github.com/dotnet/roslyn-analyzers/issues/4479) that reports if
foreach
will perform explicit cast in runtime (which is likely result in runtime exception):This issue is to ask approval to set
RuleLevel
for that analyzer asBuildWarning
.