apple / foundationdb

FoundationDB - the open source, distributed, transactional key-value store
https://apple.github.io/foundationdb/
Apache License 2.0
14.37k stars 1.3k forks source link

Simulator stops working after installing FDB 6 #1074

Open mpilman opened 5 years ago

mpilman commented 5 years ago

When fdbserver is running, the simulator stops working. This is problematic if someone wants to develop and test on OS X and I stumbled on this while working on #1058.

This is how this problem can be reproduced:

  1. Download and install FDB 6 from web page.
  2. run a simulation tests.

The output that I got looked roughly like this:

/Users/mpilman/Projects/builddirs/foundationdb/bin/fdbserver -r simulation -s 0x2582dd31 -f fast/AtomicBackupCorrectness.txt
Random seed is 629333297...
ERROR: Could not locate shared memory - 'machineId'

Even stopping foundationdb does not make the error go away. The only way I could figure out to make the error go away was:

  1. uninstall foundationdb with the script
  2. restart the machine.

As mentioned in #1058, cmake will try to find an installed foundationdb and warn the user if it can't find one (for running upgrade restart tests with an older binary). But if an installed fdb results in make test not working, this is generally problematic.

mpilman commented 5 years ago

One hacky way of fixing this short term would be to not try to find a machine id in shared memory if fdbserver is started in simulation mode. However, if someone has better knowledge than me about boost::interprocess I would strongly prefer a real solution

alexmiller-apple commented 5 years ago

I ended up installing an FDB6 package, so I tried to reproduce, and my local simulation run passed. Restarting the machine probably cleared the shared memory file, which is why it fixed the issue. It appears that there's no way to list or inspect open posix shared memory files on MacOS, which makes debugging this rather more difficult, but I'm going to guess something odd happened with your machineID file.

That said, I'm not actually super clear on why simulation needs to use the machineID file, as I'm not clear that it'd matter to simulation if different runs have different machineIDs ...?