dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
19.05k stars 4.04k forks source link

[Wiki] Null-conditional operator #21765

Closed jods4 closed 7 years ago

jods4 commented 7 years ago

I was reading the "New language features in C# 6" page, specifically the null-coalescing operator.

(Aside: I wanted to check if method arguments with side effects are called or not in x?.Do(SideEffect()), which is not described there but would be a nice addition. Looks like the answer is no.)

The following paragraph made me twitch a little bit:

This is an easy and thread-safe way to check for null before you trigger an event. The reason it’s thread-safe is that the feature evaluates the left-hand side only once, and keeps it in a temporary variable.

Is there any magic trick that you perform to prevent read introduction? If you compile Event?.Invoke() to:

var ev = Event;
if (ev != null) ev.Invoke();

Then it's not thread-safe in the strictest sense. Because the JIT (or even the compiler) is free to remove the local variable and introduce a second read. It's the opposite of read elimination and it's allowed unless we speak about volatiles.

(For the curious: Volatile.Read() is how you can make the code above thread-safe.)

I know that currently MS JIT doesn't do read introduction on heap variables, precisely because they are aware that it would be a breaking change for such code. It doesn't mean they won't do it in the future, that Mono doesn't do it, etc.

You have no guarantee because the specs say such code is NOT thread-safe, so maybe you should remove that remark from the wiki. There is already a lot of confusion on this topic, see https://stackoverflow.com/questions/3668953/raise-event-thread-safely-best-practice.

Also: this kind of discussions create a false sentiment that the code is thread safe. Yes, calling the delegate might be thread safe if done correctly, yet the delegate itself must be prepared to be called after it was unsubscribed from. Most developpers are completely unaware of that fact. Multithreaded code without proper synchronization primitives is very hard.

CyrusNajmabadi commented 7 years ago

Then it's not thread-safe in the strictest sense. Because the JIT (or even the compiler) is free to remove the local variable and introduce a second read. It's the opposite of read elimination and it's allowed unless we speak about volatiles.

This is not true. The jit is not allowed to do anything that would cause a second read of the event.

sharwell commented 7 years ago

Then it's not thread-safe in the strictest sense. Because the JIT (or even the compiler) is free to remove the local variable and introduce a second read. It's the opposite of read elimination and it's allowed unless we speak about volatiles.

This is really fascinating. In the strictest sense (a conforming implementation of the CLI, per ECMA-335 §I.12.6.4), I tend to agree. However, you can feel safe that no concrete implementation of the CLI is going to actually do this, for several reasons:

  1. Stores to temporary local variables are widely used in intricate concurrent code to ensure thread safety, even though this code would not be safe under the situation you described. Such a change would introduce widespread bugs which are extremely difficult to detect.
  2. An implementation of the CLI like you described would undermine new language features, including ?. and flow analysis work related to nullable reference types.
  3. An implementation of the CLI like you described would be slower than the current implementation for several scenarios.

I imagine the most appropriate course of action would be an update to the CLI specification to account for this unwritten requirement we've been operating under this whole time.

jods4 commented 7 years ago

@CyrusNajmabadi

The jit is not allowed to do anything that would cause a second read of the event.

It sure is allowed to. The hardware, JIT, and compiler are free to change everything they want in a program as long as its side-effects are unchanged when running on a single thread. A side effect is defined as a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.

This is not unlike how reads can be reordered or even eliminated.

If you think otherwise, do you have a reference please? 😁

@sharwell

you can feel safe that no concrete implementation of the CLI is going to actually do this, for several reasons [...] Such a change would introduce widespread bugs which are extremely difficult to detect.

Yes and I point it out in my post. I am aware that MS team who's in charge of JIT has said it currently won't introduce reads to heap references on x86 and x64. I also pointed out that it's not standard and not future-proof: I don't have such a statement for .NET 5, .NET on ARM, Mono, etc... You guys should add a line to forbid read introductions in the specs if this is how you feel.

So is the code actually thread-safe, or very likely to be thread-safe because of undocumented implementation details?

An implementation of the CLI like you described would undermine new language features, including ?. and flow analysis work related to nullable reference types.

No it doesn't. All language features are only "true" for single-threaded code. Flow analysis works with if (x != null) { x.Something() } and that's obviously broken with concurrency if x is on the heap. ?. is no different. BTW it's useful syntactic sugar in many situations that don't involve concurrency. My events do x?.Invoke() all the time and I never expected them to be thread safe. It's just convenient since they might be null when no one has subscribed.

Lock-free code is very tricky and very hard to get right. I don't think it's a good idea to promote it through language features such as ?.. In addition to being incorrect strictly speaking, there are other pitfalls that most developers are unaware of, like being called before code is properly initialized or after the event was unsubscribed.

I imagine the most appropriate course of action would be an update to the CLI specification to account for this unwritten requirement we've been operating under this whole time.

Well, yes. If you guys think that we should all operate under the assumption that read introduction is not a thing, it's time to add it to the specs. It will also clarify this situation, which is honestly a mess. When I dug deep into lock-free programming on .NET, it took me a lot of time, various resources (not always correct) and even exchanges with MS employees to figure it all out.

Until then, it would be nice to promote correct code (Virtual.Read()). Fun fact: on x86 all reads are actually acquire barriers so it comes for free. That's not true of other platforms like ARM, which is probably why the statement of the JIT team I've seen mentioned x86 specifically.

CyrusNajmabadi commented 7 years ago

It sure is allowed to. The hardware, JIT, and compiler are free to change everything they want in a program as long as its side-effects are unchanged when running on a single thread.

reads of a local without an intervening write cannot be observed to return different results. And you can't have multiple threads writing to a local (since, by definition, it's local).

That means it's it's non-null, it will remain non-null. If it's null it will remain null.

Now, what you might be thinking about is what data you might read from that non-null value. i.e. if i read from something non-null will i see that data in a consistent state. That's not guaranteed as the data teh local points to might be in any sort of consistent state vis as vis reads and writes.

CyrusNajmabadi commented 7 years ago

No it doesn't. All language features are only "true" for single-threaded code. Flow analysis works with if (x != null) { x.Something() }

If you have a local, it can only be observed by a single thread. So it being null or non-null will be consistent. Now, whether or not it is still safe to call x.Something() is entirely a different manner depending on the threading of your system.

However, the point of hte original complaint is that it's not appropriate to recommend:

SomeEvent?.Invoke();

// vs

if (SomeEvent != null)
    SomeEvent();

it is appropriate to go with teh former as it translates to:

var __temp = SomeEvent;
if (__temp != null)
    __temp.Invoke();

Which is the pattern that is recommended to avoid the null-ref race condition that can happen with the check and invoke pattern.

CyrusNajmabadi commented 7 years ago

Also: this kind of discussions create a false sentiment that the code is thread safe.

The statement says: This is an easy and thread-safe way to check for null before you trigger an event

It is not making a claim about thread safety in a broad sense. Just like if we had a statement saying "entering a monitor is a way to provide exclusive thread access to a region of code" then it would not be making a general thread safety claim beyond that.

jods4 commented 7 years ago

To go back to why I wrote this issue in the first place, since quite a bit of discussion has taken place since.

This is an easy and thread-safe way to check for null before you trigger an event

It is not thread-safe per the current specifications of .NET.

.NET memory model is already confusing with lots of inaccurate information, I don't think MS should make incorrect statements about it.

If you feel like the JIT and compiler will never introduce reads, which is plausible, add it to the specs please. Otherwise it's an undocumented implementation detail.

CyrusNajmabadi commented 7 years ago

It is not thread-safe per the current specifications of .NET.

I'm concerned with the actual state of the .net ecosystem. Where it is safe :). I have no problem with updated documentation explicitly describing it as such.

If you feel like the JIT and compiler will never introduce reads, which is plausible, add it to the specs please. Otherwise it's an undocumented implementation detail.

The compiler needs no such language. The language spec for C# simply does not permit this sort of thing. Indeed, it explicitly says about this construct "except that P is evaluated only once."

Because the evaluations is only allowed to happen once, as per the language spec, it would be a bug in the compiler if it allowed anything to happen at its level to mean that P evaluated out to anything different.

Feel free to file a bug on against the CLR to have explicit wording for this for locals.

CyrusNajmabadi commented 7 years ago

Additional C# language spec to back this up:

a local_variable_initializer that gives the initial value of the variable.

the value of a local variable is modified using an assignment (Assignment operators)

(i'm excluding the terminology in the spec around ref/out).

We have defined the behavior for C# here. As there is no assignment, the value of the local variable cannot change after the first read. If it is null, it will remain null. If it is non-null, it will remain non-null.

jods4 commented 7 years ago

@CyrusNajmabadi I don't feel like entering a long debate about this, I wrote a long answer and wiped it.

If you think that this is actually thread-safe per the specs I would love to see a reference that forbids "memory read introduction".

Otherwise, if you acknowledge that it's thread-safe because of an undocumented implementation detail of the current JIT and you're fine with saying that it's thread-safe... then fine, it's your call. You represent C#.

If you go that way, maybe ask the JIT team to update the .NET memory model to forbid read introduction. Everyone will be much happier with a simpler memory model.


PS: I see nothing in your C# quotes that prevent read introduction. First please note that this was a C# specs quote as well:

A side effect is defined as a read or write of a volatile field, a write to a non-volatile variable, a write to an external resource, and the throwing of an exception.

A normal memory read is not a side-effect. It can be reordered, eliminated or introduced freely by the compiler, JIT and CPU. In fact, aggressive reordering and elimination do happen.

And keeping it concise:

except that P is evaluated only once

refers to evaluating the expression and its side effect, which a memory read isn't. Anyway, look at the IL you emit, it's the same as the code people write for events and it has to play by the .NET memory model rules (safe by accident).

As there is no assignment, the value of the local variable cannot change after the first read. If it is null, it will remain null. If it is non-null, it will remain non-null.

Absolutely, as long as there is no concurrency. All those rules in the C# specs are only applicable from the perspective of a single thread.

Want to do multi-threaded? Volatile is the primitive that gives you some guarantees. Normal memory accesses give practically no guarantee when it comes to concurrency.

CyrusNajmabadi commented 7 years ago

Absolutely, as long as there is no concurrency. All those rules in the C# specs are only applicable from the perspective of a single thread.

There cannot be any concurrency on a local. It's a local. There may be date races for what the local variable points at, but not of hte local itself.

CyrusNajmabadi commented 7 years ago

If you go that way, maybe ask the JIT team to update the .NET memory model to forbid read introduction. Everyone will be much happier with a simpler memory model.

I am fine with you opening an appropriate bug with them to document this. That would be a different repo though. This repo is for the C# compiler, which is behaving properly (AFAICT) as per the C# specification. csharplang would be a place to file and request for language specificaiton changes. However, in this case, i think the spec is correct as is.

CyrusNajmabadi commented 7 years ago

Normal memory accesses give practically no guarantee when it comes to concurrency.

We are not accessing memory here. We're accessing a local. The spec outlines what is allowed to change the value of a local. And it specifies evaluation of a local.

CyrusNajmabadi commented 7 years ago

refers to evaluating the expression and its side effect

It refers to evaluating the expression. Literally "getting it's value". As per spec, this get of its value can only happen once. Because the value is then placed in a local, and because local values can only change from a very specific set of language constructs, the local is guaranteed to be the same value as per the spec.

Again, if this local points to something external, it might be pointing to data that can change concurrently while that's happening. But that's an orthogonal concern.

CyrusNajmabadi commented 7 years ago

PS: I see nothing in your C# quotes that prevent read introduction.

I don't think we need extra terminology to disallow something that is already disallowed by the existing spec :)

A normal memory read is not a side-effect. It can be reordered, eliminated or introduced freely by the compiler, JIT and CPU.

The compiler is not free to eliminate reads of the local variable here. Doing so would violate teh C# language specification. I would be fine with additional language being provided in the JIT that this sort of IL sequence will not be compiled to assembly that violates this either.

jods4 commented 7 years ago

The compiler is not free to eliminate reads of the local variable here.

Do you mean introduce reads? Eliminating reads actually happens all the time, starting with the CPU and the JIT.

As far as introducing reads to eliminate a local variable, I got from Eric Lippert that it was OK with C#, but the JIT doesn't do it on heap references for compatibility reasons. I found nothing in the .NET memory model that contradict this, so ref please?


You seem unconvinced that eliminating a local even makes sense. Why would you do that? For perf. Consider that code:

var x = globalVar;
if (x != null) x.Do();

The JIT will usually turn x into a register. In asm-like pseudo code:

eax = [globalVar]
if (eax) 
{
  push eax
  call [Do]
}

In situations where the JIT has register pressure, it can't keep everything in a limited set of registers and must spill stuff (on the stack).

eax = [globalVar]
if (eax)
{
  move [x], eax // x is a local var on stack
  // more stuff that consumes registers
  move eax, [x]
  push eax
  call [Do]
}

But... Reading [globalVar] directly from main memory rather than putting it on the stack would be more interesting. The value is already in L1 cache and it avoids a write. So this code is better:

eax = [globalVar]
if (eax)
{
  // do stuff that consume registers
  mov eax, [globalVar] // read introduction, from L1 cache
  push eax
  call [Do]
}

But you read heap memory a second time! Yes... it was a regular memory read. Not an acquire/volatile read. A normal read is not meant for synchronizing concurrent code. This would have been forbidden with a volatile read. As long as the observable behavior of a single-threaded program doesn't change, these kinds of optimizations are game.

Then again, the JIT is allowed to emit such code, but it doesn't do it because too many people are convinced that assigning to a local C# variable gives thread safety (according to the specs it does not).

CyrusNajmabadi commented 7 years ago

Do you mean introduce reads? Eliminating reads actually happens all the time, starting with the CPU and the JIT.

I meant that the language specifies how this works, and specifies completely the only sorts of changes that can result in a different value being evaluated from a local variable.

CyrusNajmabadi commented 7 years ago

As long as the observable behavior of a single-threaded program doesn't change, these kinds of optimizations are game.

Speaking from C#'s perspective, this sort of optimization would not be ok with us given how we've spec'ed out how local variables work. If the jit ever did decide such an optimization was acceptable, we would not emit such code. However, there is no way the jit would make such a change as it would be one of the largest breaks potentially imaginable.

As already mentioned, if you'd like the jit to commit to that, i'm totally ok with that. It's just not relevant in this repo :)

sharwell commented 7 years ago

Flow analysis works with if (x != null) { x.Something() } and that's obviously broken with concurrency if x is on the heap. ?. is no different.

The Null Conditional Operator compiles down to a store to a local variable in order to implement the required single-evaluation semantics.

https://github.com/dotnet/csharplang/blob/master/spec/expressions.md#null-conditional-operator

Flow analysis using code sequences like the following are the same:

private Sometype x;

// ...

if (this.x is SomeType y)
  y.Something();

Until then, it would be nice to promote correct code (Virtual.Read()).

The understanding of both the Roslyn team and the CLR team is the sequence currently being used is correct in practice currently, and will remain correct for practicality reasons. It is so unlikely that a deviation from this will even be suggested that updating the specification to account for it specifically is a low priority work item.

This is an easy and thread-safe way to check for null before you trigger an event

It is not thread-safe per the current specifications of .NET.

It is thread-safe per the C# language specification, which is what governs the C# code in question. While there isn't a bug in the recommendation, it could be argued that there is a bug either in the IL sequence produced by the current C# compiler (makes invalid assumptions about CLI memory model) or the CLI specification (fails to restrict optimizations that would introduce reads of shared state that are not expressed in the IL sequence).

Absolutely, as long as there is no concurrency. All those rules in the C# specs are only applicable from the perspective of a single thread.

Read introduction for a memory location accessible to another thread could result in an observable failure to adhere to the data dependence requirement for the execution order of a concurrent C# program.

💭 I'm not convinced that the current CLI specification truly allows the described behavior. Consider the following IL sequence:

ldfld x
stloc.1
ldloc.1
brnull end
ldloc.1
callvirt Invoke
end:

Elimination of the local variable could change the execution order for a thread by introducing a NullReferenceException on the callvirt instruction which cannot occur according to this IL sequence. Even a concurrent execution sequence is not allowed to alias the local variable for a store operation. Synchronous exceptions are considered visible side-effects, and as such cannot be introduced where it would be impossible according to single-threaded execution.

CyrusNajmabadi commented 7 years ago

many people are convinced that assigning to a local C# variable gives thread safety (according to the specs it does not).

It does give thread safety. our spec defines how local variables work in the language, including how reads/writes to them work. It doesn't in totality. i.e. it doesn't carve out a space that says that that might change in the presence of multithreaded code for these types of constructs. it states it for all code.

Now, to support that end, the compiler emits IL ode that it knows the JIT will handle properly. And the JIT appropriately works to compile that IL to machine code that provides the semantics that we feel are a reasonable compromise between perf and correctness. Specifically, if we could not trust local variables in this manner, writing any sort of correct program would be too difficult.

Then again, the JIT is allowed to emit such code, but it doesn't do it

Yes. Which is as i stated originally. Doing so would would be devastating to the .net ecosystem. It would immediately break humongous swaths of code (both in teh BCL and in customer code), and it would make developing for .net at least an order or magnitude harder for anyone dealing with concurrrent code.

Concurrency is already extremely hard to deal with. The bugs and complexity introduced by this would totally outweigh any sort of benefits anyone might ever think might be there.

jods4 commented 7 years ago

Well, that was fascinating so I'm glad I opened this issue, even though it seemed a bit nitpicking when I did.

To sum up, mostly from @sharwell: A local variable can't change its value at runtime even under concurrency, per the C# specs. The IL that C# generates is not correct in that regard under the ECMA memory model. It is correct in practice with MS implementation of .NET, because of implementation details.

Since you both say that actually introducing reads is so devastating it will never happen, it would be nice to update the .NET memory model to say so. That would clear things up once and for all.

Until then, the amount of confusion on this topic is staggering, which makes writing lock-free code on .NET almost impossible.

There are conflicting messages even from MS on this very subject. Stephen Toub, Igor Ostrovsky and Eric Lippert have all said in the past that read introduction replacing locals was a possibility. For a very explicit example, see this answer from Eric Lippert: https://stackoverflow.com/a/14816210/2427990

CyrusNajmabadi commented 7 years ago

it would be nice to update the .NET memory model to say so. That would clear things up once and for all.

"I have no problem with updated documentation explicitly describing it as such." "Feel free to file a bug on against the CLR to have explicit wording for this for locals."

:)

CyrusNajmabadi commented 7 years ago

As for Eric (who i still miss greatly :)), i would disagree with "The C# compiler never introduces reads in this manner. It is permitted to, but in practice it never does."

I don't see how i can reconcile that with:

"If K is zero and the simple_name appears within a block and if the block's (or an enclosing block's) local variable declaration space (Declarations) contains a local variable, parameter or constant with name I, then the simple_name refers to that local variable"

Such a change in the compiler would violate this clause of the specification. simple_name would no longer return to that local variable.

jods4 commented 7 years ago

I think that he derived this from the reasoning that the compiler is allowed any optimization that doesn't change the program behavior in the absence of concurrency (without synchronization primitives such as volatiles read/write or memory barriers).

He kind-of says so in his SO answer.

You know better than I do whether this premise is correct or not?

CyrusNajmabadi commented 7 years ago

That's definitely an interesting interpretation. I'm not sure i agree. But i'll have to think further on it.

Pilchie commented 7 years ago

Closing this, as it seems to be resolved.