Closed Anthony-Dong closed 1 month ago
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.)
Why do different chips behave inconsistently?
That is strange. I can reproduce.
If it is a memory race condition, it should report a memory access violation rather than a map rw error.
I'm not sure what you mean here by a "memory access violation". Do you mean, similar to what the race detector reports?
Without -race
, the detection of racy map accesses is best effort. Such racy accesses may be detected, they may not. This may be part of the explanation of the differences between architectures. Regardless, if the map implementation does detect a race it is intentional that it reports a map rw error, not any report that would come from the race detector when -race
is provided. Mostly this is because we don't keep track of the stack trace of the first of the two racy accesses, so a report like the race detector provides is impossible.
It's not easy to simulate memory access violations in Go, but it can be done using C++. Below are two pieces of code written in C++, similar to Go's unsafe replace. There will be a segmentation fault, but adding a read/write lock can solve the problem! @randall77
#include <map>
#include <thread>
#include <vector>
int main() {
std::vector
threads[10] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
std::map<std::string, int> clone_kvs{};
// copy
clone_kvs.insert(kvs.begin(), kvs.end());
// set
clone_kvs["2"] = 2;
// replace
kvs = std::move(clone_kvs);
}
});
for (auto& thread : threads) {
thread.join();
}
}
2. right code
```cpp
#include <map>
#include <mutex>
#include <shared_mutex>
#include <thread>
#include <vector>
int main() {
std::shared_mutex rw_lock{};
std::vector<std::thread> threads{11};
std::map<std::string, int> kvs{{"1", 1}};
for (int x = 0; x < 10; x++) {
threads[x] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
{
std::shared_lock<std::shared_mutex> rlock{rw_lock};
auto _ = kvs["1"];
}
}
});
}
threads[10] = std::thread([&]() {
for (int x = 0; x < 100000; x++) {
std::map<std::string, int> clone_kvs{kvs};
// safe read
{
std::shared_lock<std::shared_mutex> rlock{rw_lock};
clone_kvs.insert(kvs.begin(), kvs.end());
}
// set clone kvs
clone_kvs["2"] = 2;
// safe replace
{
std::unique_lock<std::shared_mutex> wlock{rw_lock};
kvs = std::move(clone_kvs);
}
}
});
for (auto& thread : threads) {
thread.join();
}
}
I'm not sure what you are proposing then.
We could of course cause a segfault instead of reporting a "rw error". But I don't think that's very helpful. Why is the former better when the latter provides a more accurate description of the user's error?
I'm not sure what you are proposing then.
We could of course cause a segfault instead of reporting a "rw error". But I don't think that's very helpful. Why is the former better when the latter provides a more accurate description of the user's error?
Because from the perspective of the code, it is very difficult to understand that this problem is caused by concurrent access to the map resulting in an RW exception. This explanation doesn't make sense.
Hope to use more explicit exception information
As written, this code doesn't directly race on the internals of the map. It does explicitly race on m.kv
(the pointer to the map).
That is, in program order, the replacement map is fully written before being assigned to m.kv
. After assignment the map is not modified.
On a system with a total store order (TSO, all stores become visible in program order, i.e., amd64), the concurrent map read and map write
error will never occur because the reader will never see the hashWriting
bit.
On a system without TSO (arm64), I suspect something like this occurs:
clone
writing
flag: clone.flags &^= hashWriting
(clone.flags == 0
now)m.kv = clone
.m.kv
, receiving the new map clone
clone.flag == hashWriting
.What happens here is that without TSO, one CPU clears hashWriting
and then writes to m.kv
, but the other CPU observes the write to m.kv
but not the write to clone.flags
. So this explains why only arm64 sees the throw.
As for why not segfault, I agree with @randall77 in not really understanding why we should do so. The fact that your C++ code segfaults is a coincidence of implementation. C++ certainly does not guarantee that racy accesses will segfault. Our code could theoretically segfault on arm64 if the reader observed clone.flags
as 0, but saw some other internal state as stale in a way that confused the reading side.
In practice, on x86 I'm almost certain that 8-byte stores never tear, so readers will always see a valid pointer, though which is not guaranteed. I believe arm64 is the same, though maybe not for unaligned stores?
Given these semantics, I don't even think it would be possible to guarantee a fault on racy access without something like the full race detector in every program.
Thanks, I was able to solve this issue using atomic.Value or sync.RWMutex. However, with the proliferation of Apple's M-series chips, such cases are indeed hard to understand because the code that previously had no issues is now problematic after switching to the M-series chips. @prattmic
On a system with a total store order (TSO, all stores become visible in program order, i.e., amd64), the
concurrent map read and map write
error will never occur because the reader will never see thehashWriting
bit.On a system without TSO (arm64), I suspect something like this occurs:
- G1 in Set(), finishing up writing the last element to the new map,
clone
- G1 clears the
writing
flag:clone.flags &^= hashWriting
(clone.flags == 0
now)- G1 publishes the new map
m.kv = clone
.- G2 reads
m.kv
, receiving the new mapclone
- G2 starts accessing the map and reads
clone.flag == hashWriting
.- G2 triggers concurrent read/write map error.
Indeed, this is the issue. can use the following program to reproduce it.
package main
import "time"
type Data struct {
flag int
}
func main() {
var data = &Data{flag: 1}
go func() {
for {
if data.flag != 1 {
panic(`error`)
}
}
}()
go func() {
for {
clone := &Data{flag: 2}
clone.flag = 1
data = clone
}
}()
time.Sleep(time.Second * 5)
}
However, with the proliferation of Apple's M-series chips, such cases are indeed hard to understand because the code that previously had no issues is now problematic after switching to the M-series chips.
Indeed, this is the classic pain point when porting applications from amd64 to arm64 (or anything else, really: amd64 has the most strict memory ordering semantics. See table at https://en.wikipedia.org/wiki/Memory_ordering#In_symmetric_multiprocessing_(SMP)_microprocessor_systems).
I'm going to close this, as I don't think there is anything we can do here. Note that the race detector should catch these cases (if it does not, please reopen). If you have suggestions on how we can improve the situation, we're open to hearing them.
Go version
go version go1.20 linux/amd64 go version go1.20 darwin/arm64
Output of
go env
in your module/workspace:What did you do?
I have a piece of code that behaves inconsistently on Mac-Apple M3 Pro and Linux amd64 when the -race detection is not enabled. On Mac-Apple M3 Pro, it directly reports a map rw error, but on Linux amd64, it doesn't.
What did you see happen?
Linux
Mac
What did you expect to see?