Closed tannergooding closed 6 years ago
Talked with @MadsTorgersen briefly and it was determined logging an issue for this would be good.
There is the normal concern of this being a breaking change, since any change to the handling of the bools will cause code reliant on that behavior to go down a different code path (this, at the very least, impacts the &&
code pattern, which treats any number with the first bit set as true
, and any other value as false
).
Also FYI. @jaredpar, @mikedn, @jkotas
The C# language specification is not specific to any particular runtime platform. The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code. Specifying that would be a peculiar departure from the normal situation in which the specification avoids telling you in detail the way unsafe code works.
The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code.
Only if you limited such code to C#.
It is entirely possible for C# to be consuming code produced by another language where the bool
value conforms to the CLR definition but doesn't line up with the C# implementation. In which case updating the spec would cause the behavior in that scenario to be well-defined instead of undefined or implementation-specific.
See also #12211 #14262 and https://web.archive.org/web/20160318193105/https://roslyn.codeplex.com/workitem/224
@tannergooding What languages do you have in mind that use a different representation for bool
?
ILASM, in general. C++/CLI for desktop, although it could realistically be any language compiled to IL (such as C or C++). There are several programs for converting between LLVM bytecode and IL bytecode, for example. extern
methods are not unsafe and can also return a bool
value that is not 0 or 1.
As a hypothetical example:
.method public hidebysig static
bool MyTrue () cil managed
{
.maxstack 8
IL_0000: ldc.i4.2
IL_0001: ret
}
The C# language would see this as static bool MyTrue()
and the value returned would be true
, according to the ECMA-335 spec (Common Language Infrastructure specification). However, C# would handle it incorrectly for the case of &&
.
See the last entry in https://web.archive.org/web/20160318193105/https://roslyn.codeplex.com/workitem/224 for how we resolved this in 2014. The resolution was that we would document it (presumably, in https://github.com/dotnet/roslyn/tree/master/docs/compilers) but we do not appear to have done so.
I don't believe the linked resolution is exactly correct.
The last comment is making a statement that the runtime bools have more than two states. However, it could easily be argued that they do in fact only have two states (true
, which is a bit pattern with any one or more bits set; or false
, which is a bit pattern of all zeros).
For the runtime, a boolean
with a bit pattern of 10
is equivalent to a bit pattern of 01
, they represent the same state.
I'm also not sure it will be clear what a "non-standard" bool is without clearly defining the underlying bit representation of true
and false
. Otherwise, the exact definition of a "standard" bool is still dependent on the underlying runtime and the implementation of a given C# compiler.
For example, the current Roslyn implementation is generally inline with the runtime definition, except for a couple of weird scenarios with the short-circuit comparisons where it treats it as true has a bit representation with the least significant big set
and false has a representation with the least significant bit cleared
.
So wouldn't this also mean that any C# assembly exposing a bool
wouldn't be CLS compliant because of violations of the ECMA spec?
The argument that C# is independent of the CLR is a copout. C# exists in an ecosystem with other languages bound by the common contract that is the CLS/CLR. More than once has the argument been made against various C# proposals given that they would not play well across that ecosystem. It's ludicrous to expect that every other compiler that might want to target the CLR will have to know about this backwards behavior in order for it to produce assemblies that will work as expected.
Can the compiler be updated? Given that we are emitting for the CLR, we would make our handling match the expected representations for the two boolean values (0/not 0) of the underlying platform
The compiler is operating as designed here and has been for the entirety of its existence. This issue has been brought up several times throughout the history of the compiler. We considered the arguments but ultimately have decided to continue on with this approach.
This issues should be moved to the C# language repository. The compiler is operating as designed here. If there is a call for further clarity in the spec then csharplang is the right place to drive that.
@jaredpar, I had opened it here, rather than csharplang, at the request of @MadsTorgersen.
This issue has been brought up several times throughout the history of the compiler. We considered the arguments but ultimately have decided to continue on with this approach.
It also continues being brought up as a pain point every few months. I believe it would help if the compiler was at least consistent, but I'm not sure you could call it that.
The current behavior can currently be described as:
value
, !value
, left | right
, and left || right
false is 0
; true is not 0
left == right
and left != right
false is 0
; true depends on the underlying bit values being the same
left & right
, left && right
, and left ^ right
result is dependent on the underlying bit representation of each value
Basically, what I'm seeing is that the C# spec states there are two values, but the implementation is handling it as if there are more.
That is
0
and not 0
0
and 1
Instead, it is handling it as 0
and depends on the underlying bit pattern
, which I would view as inconsistent with both the C# and the runtime specifications.
@jaredpar I'm reopening this and assigning it to myself to document the compiler behavior.
It also continues being brought up as a pain point every few months. I believe it would help if the compiler was at least consistent, but I'm not sure you could call it that.
The job of the compiler is to emit code that will behave correctly in the face of boolean values that are 1 or 0. The behavior of other values is inconsequential.
Instead, it is handling it as 0 and depends on the underlying bit pattern, which I would view as inconsistent with both the C# and the runtime specifications.
Disagree. The compiler correctly handles the cases that it supports: 0 and 1.
The compiler correctly handles the cases that it supports: 0 and 1.
This needs to be documented, if not in the language spec, then in a compiler implementation details spec.
In my opinion, The language spec should explicitly state that handling of values with a bit representation different than true
or false
is implementation defined. It should also either explicitly list the bit representation of true
and false
or also call out that they are implementation defined, but non-identical, values.
Additionally, it would be a nice to have to have something (probably in a compiler implementation details specification
) explicitly stating that:
==
, !=
, &
, &&
, and ^
)This needs to be documented, if not in the language spec, then in a compiler implementation details spec.
This has been documented several times, some in this various repo. Unfortunately searching for terms like "bool" and "true" is meaningless given our issue size. I'm fine with this being documented as behavior of the compiler. Saying our code gen is inconsistent though is incorrect.
Unfortunately searching for terms like "bool" and "true" is meaningless given our issue size.
Would it be reasonable to do the following in the language spec?
Roslyn Implementation Detail
annotation that calls out Roslyn specific behavior
informative
text, rather than normative
textRoslyn Implementation Details
section as an annex to the spec that cross-references the individual implementation detailsSaying our code gen is inconsistent though is incorrect.
I was saying it is inconsistent with the spec, which I still think it is, at least until the spec is updated. The compiler is implementing true
(1), false
(0), and implementation defined handling (2-255). I would view this as explicitly handling three states rather than 2 (which would have to be implemented as true is x
, false is not x
, as anything else results in a third state, even if that state is undefined or implementation defined)
Roslyn documentation belongs in the Roslyn repo. I will add some appropriate documentation for this issue in docs/compilers
.
The only observable consequence of placing something like that in the specification would be the behavior you get when you overlay a bool and a byte in unsafe code.
You wouldn't even need any unsafe
, explicit layout can do that trick just fine already.
@Joe4evr Although LayoutKind.Explicit
does not require unsafe
, it is unsafe. There is no other way to explain the behavior of explicit layout by reference to the C# language specification. It is too late for the C# compiler to require it only be used in an unsafe context.
@Joe4evr @gafter I don't think it would make sense to require the unsafe
keyword anyway. Explicit layouts are completely verifiable IL as long as you're not overlaying references (type safety violation) or private fields of structs (visibility violation). Blitting simple primitives on top of each other doesn't violate either of these conditions.
@GrabYourPitchforks Verifiable perhaps, but such code can undermine the compiler's implementation invariants.
Issue
Today the C# compiler does not emit code handling for
bool
that is consistent with the underlying representation ofbool
in the CLR.The C# specification states:
The spec also indicates that it is 1-byte. Specifically that the result of the
sizeof(bool)
expression is1
.While the ECMA 335 specification states:
This can lead to confusion about the handling and cause various inconsistencies in various edge cases (see https://github.com/dotnet/coreclr/pull/16138#discussion_r165256495, for such a thread).
Proposal
It would be good to determine:
true
/false
so that two implementations don't behave differently0
/not 0
) of the underlying platformCurrent Behavior
The majority of the behaviors below actually match the CLR expectation that a
bool
can be more than just0
or1
. However some of the behaviors (such as&&
) does not match this expectation and can cause issues when interoping with any code that would assume otherwise (generally this is some kind of interop or unsafe code).!value
left == right
left != right
left & right
left | right
left ^ right
left && right
left || right
value ? "true" : "false"