Open hamarb123 opened 5 months ago
I don't think it would be a big deal to add different ordering mechanics, it's not there already because there hasn't been sufficient interest. That said, we shouldn't confuse ease of development with design.
And that would be a way forward for those cases where the .NET memory model is too strict.
What about:
void ReadBarrier();
void WriteBarrier();
void PartialReadBarrier(); // or Full based on which we want to be the default / HalfReadBarrier / etc.
void PartialWriteBarrier(); // same as above
? I don't have an issue on providing the ones with less guarantees that only stop the same operation, not all operations (they would suffice in my code example for example).
What I have an issue with is not also providing one's that match our current memory model, whether we only provide these, or provide these in addition to others.
@hamarb123 - Regarding the API proposal. A few suggestions:
I am not sure it is worth mentioning about possible optimizations.
I'd leave that to implementors to figure whether and when which optimization is applicable. (I think "fusing" optimizations would not be valid in general, for example, but maybe there are cases where they would work).
By definition optimizations should not be observable via program effects, so we should just specify the desired semantics and JIT developers will do whatever is needed/possible.
One more use of the barriers is to order a batch of multiple reads or writes. That also includes ordering non-atomic reads/writes like with structs. It is worth mentioning.
This is the reason why Interlocked.ReadMemoryBarrier()
exists. The fact that we have it in two runtimes (CoreCLR and NativeAOT) and on all supported platforms is a good evidence that someone else might use the barriers for similar purposes.
Not sure we need to go in details about what we are no longer proposing - like volatile. ldobj
and cpblk
and initblk
. Not in the beginning of the proposal anyways. This is a complicated topic - imagine the reader is losing focus a little with every word they read. We do not want to spend that attention on cpblk
. :-)
Maybe worth mentioning in alternative designs, but not in the actual proposal.
Marked as ready-for-review, we can discuss further there.
@VSadov I will update it when I'm able later today :) Thanks
I do think the semantics of the operations need to be clearly defined. For instance, though it would be unfortunate, the difference indicated here would need to be clearly specified.
It may be a matter of documentation, but we should have a clear understanding of what the aim is for now from the OP.
For the desired semantics. I think we should start with:
Volatile.ReadBarrier()
Provides a Read-ReadWrite
barrier.
All reads preceding the barrier will need to complete before any subsequent memory operation. Volatile.ReadBarrier()
matches the semantics of Volatile.Read
in terms of ordering reads, relative to all subsequent, in program order, operations.
The important difference from Volatile.Read(ref x)
is that Volatile.ReadBarrier()
has effect on all preceeding reads and not just a particular single read of x
.
Volatile.WriteBarrier()
Provides a ReadWrite-Write
barrier.
All memory operations preceding the barrier will need to complete before any subsequent write. Volatile.WriteBarrier()
matches the semantics of Volatile.Write
in terms of ordering writes, relative to all preceeding, in program order, operations.
The important difference from Volatile.Write(ref x)
is that Volatile.WriteBarrier()
has effect on all subsequent writes and not just a particular single write of x
.
The actual implementation will depend on underlying platform.
It may be a matter of documentation, but we should have a clear understanding of what the aim is for now from the OP.
My main aim is to enable the API usage I have as an example. It would also be nice if we could fix volatile.
prefixes, but this can be done separately if desired.
Notably, I wouldn't actually need ReadWrite-Write
or Read-ReadWrite
barriers I think, I believe Write-Write
and Read-Read
should be enough for this.
@VSadov I've updated it, can you double check that it's fine?
@hamarb123 Looks very good! Thanks!
I will add the expected semantics to the proposed entry points. But we will see where we will land with those after reviews.
Btw @VSadov, both my example and the runtime's usages only seem to need Read-Read
/Write-Write
, so I think it'd be good to get overloads for those if we also keep the Read-ReadWrite
/ReadWrite-Write
ones that match our current memory model, since they should have lower overhead and seem to be all that would be required most of the time. It's in the open questions section, but just thought I'd mention it so you're aware if you hadn't seen it.
An alternative may be to overload Interlocked.MemoryBarrier
with a MemoryConstraint
enum or something like that, somewhat like in C++. The enum values could be something like ReadWrite
(similar to full
), Acquire
, Release
, and perhaps in the future if needed, Read
and Write
, which would be Read-Read
and Write-Write
respectively. Another enum value that may be useful for CAS operations could be None
(similar to relaxed
), if we were to expand those APIs with similar overloads. The APIs being on the Volatile
class may imply that they have volatile semantics, which are very specific, and overloading them with options of different semantics may appear odd.
For instance, there are already use cases in Lock
that could benefit from acquire/release/relaxed semantics for CAS operations. Enabling more granular barriers has also been proposed before.
Btw @VSadov, both my example and the runtime's usages only seem to need Read-Read/Write-Write, so I think it'd be good to get overloads for those if we also keep the Read-ReadWrite/ReadWrite-Write ones that match our current memory model, since they should have lower overhead and seem to be all that would be required most of the time. It's in the open questions section, but just thought I'd mention it so you're aware if you hadn't seen it.
Yes, I noticed. It is a common thing with volatile. While volatile orders relatively to all accesses, some cases, typically involving a chain of several volatile accesses when you have just writes or just reads in a row, could use a weaker fence. This is a case in both scenarios that you mention.
The main impact of a fence is forbidding optimizations at hardware level. They would not necessarily make the memory accesses to cost more. The level of cache that is being used is likely a lot more impactful than forcing a particular order of accesses. Intuitively, with everything else the same, a weaker barrier would be cheaper, but I am not sure by how much in reality - 10%? 1%?
Figuring the minimum strength required would be even more difficult and error-prone task than figuring when Volatile is needed. Honestly - sometimes people just put volatile on everything accessed from different threads - because it is not that expensive, compared to bugs that could happen once a week and a year after something shipped, just because there is a new chip on the market and it does something different from where the code was originally tested.
I think going all the way of std::memory_order
is possible, but being possible might not be enough reason to do it.
I think one datapoint that could be useful for the ReadWrite-Write
vs. Write-Write
discussion, could be the performance difference of dmb ish
vs. dmb ishst
on a few arm64 implementations - just to have a practical perspective on potential wins.
The perf differences may be more apparent in memory-intensive situations where the extra ordering constraints would disable some optimizations and impose extra work on the processor / cache. It may be difficult to measure the difference in typical microbenchmarks, though perhaps it would become more apparent by somehow folding in some memory pressure and measuring maybe not just the operation in question but also latency of other memory operations.
I think going all the way of std::memory_order is possible, but being possible might not be enough reason to do it.
I agree. I think we should start with simple barriers that are aligned with .NET memory model, and wait for evidence that we need more.
It is a non-goal for .NET programs to express everything that is possible. We strike a balance between simplicity and what may be possible in theory.
I think going all the way of std::memory_order is possible, but being possible might not be enough reason to do it.
I agree. I think we should start with simple barriers that are aligned with .NET memory model, and wait for evidence that we need more.
It is worth mentioning, out of my use case, and the 2 uses of ReadBarrier in the runtime, both only require Read-Read
/Write-Write
barriers, whereas the ones matching our memory model would be Read-ReadWrite
/ReadWrite-Write
. This would result in throwing away some performance on arm for no reason other than lack of APIs (although I do not know precisely how much). I do think this is evidence that the full Read-ReadWrite
/ReadWrite-Write
barriers are probably less commonly needed than just the Read-Read
/Write-Write
barriers.
Edit: I'd still be happy if we just ended up with the ones that matched our memory model, but I'd obviously be more happy if we got the Read-Read
/Write-Write
ones, since they'd perform slightly better and be all I require.
Looks good as proposed.
There was a very long discussion about memory models, what the barrier semantics are, and whether we want to do something more generalized in this release. In the end, we accepted the original proposal.
namespace System.Threading;
public static class Volatile
{
public static void ReadBarrier();
public static void WriteBarrier();
}
Background and motivation
This API proposal exposes methods to perform non-atomic volatile memory operations. Our volatile semantics are explained in our memory model, but I will outline the tl;dr of the relevant parts here:
unaligned.
is used), and either 1) the size of the type is at most the size of the pointer, or 2) a method onVolatile
orInterlocked
such asVolatile.Write(double&, double)
has been calledCurrently, we expose APIs on
Volatile.
for the atomic memory accesses, but there is no way to perform the equivalent operations for non-atomic types. If we have Volatile barrier APIs, they will be easy to write, and it should make it clear which memory operations can move past the barrier in which ways.API Proposal
=== Desired semantics:
Volatile.ReadBarrier()
Provides a
Read-ReadWrite
barrier. All reads preceding the barrier will need to complete before any subsequent memory operation.Volatile.ReadBarrier()
matches the semantics ofVolatile.Read
in terms of ordering reads, relative to all subsequent, in program order, operations.The important difference from
Volatile.Read(ref x)
is thatVolatile.ReadBarrier()
has effect on all preceeding reads and not just a particular single read ofx
.Volatile.WriteBarrier()
Provides a
ReadWrite-Write
barrier. All memory operations preceding the barrier will need to complete before any subsequent write.Volatile.WriteBarrier()
matches the semantics ofVolatile.Write
in terms of ordering writes, relative to all preceeding, in program order, operations.The important difference from
Volatile.Write(ref x)
is thatVolatile.WriteBarrier()
has effect on all subsequent writes and not just a particular single write ofx
.The actual implementation will depend on underlying platform.
API Usage
The runtime uses an internal API
Interlocked.ReadMemoryBarrier()
in 2 places (here and here) to batch multiple reads on both CoreCLR and NativeAOT, and is supported on all platforms. This ability is also useful to third-party developers (such as me, in my example below), but is currently not possible to write efficiently.An example where non-atomic volatile operations would be useful is as follows. Consider a game which wants to save its state, ideally while continuing to run; these are the most obvious options:
But there is actually another option which utilises non-atomic volatile semantics:
Alternative Designs
public static class Volatile { public static T ReadNonAtomic(ref readonly T location) where T : allows ref struct
{
//ldarg.0
//volatile.
//ldobj !!T
}
}
Unsafe
instead:public static class Unsafe { public static T ReadVolatile(ref readonly T location) where T : allows ref struct;
public static void WriteVolatile(ref T location, T value) where T : allows ref struct;
}
volatile.
-initblk
andcpblk
, people may have use for these also:public static class Unsafe { public static void CopyBlockVolatile(ref byte destination, ref readonly byte source, uint byteCount); public static void CopyBlockVolatileUnaligned(ref byte destination, ref readonly byte source, uint byteCount); public static void InitBlockVolatile(ref byte startAddress, byte value, uint byteCount); public static void InitBlockVolatileUnaligned(ref byte startAddress, byte value, uint byteCount); }