dotnet / roslyn

The Roslyn .NET compiler provides C# and Visual Basic languages with rich code analysis APIs.
https://docs.microsoft.com/dotnet/csharp/roslyn-sdk/
MIT License
18.95k stars 4.02k forks source link

Use of interprocess named mutexes makes compiler server incompatible with mono #31146

Open kg opened 5 years ago

kg commented 5 years ago

Roslyn currently uses an interprocess named mutex to figure out whether the compiler server is running. Mono does not support these and we don't plan to support them because of the complexity and risk involved in implementing them on posix (it looks like corefx uses a shared memory segment with a table in it, which is pretty complex)

Is there some way for us to use another mechanism to detect the server's presence? Why don't we just attempt the pipe connection and go from there? If I disable that mutex check things work okay because it just tries to open the pipe.

jaredpar commented 5 years ago

One of the big problems we faced with the compiler server was how to ensure we only started up a single instance. Our early attempts at deploying the server were plagued with problems around having more than one, and sometimes dozens, spawning off for a single build.

We tried a number of approaches to fixing this problem. Including approaches similar to the one I believe you're suggesting (please correct me if i'm wrong):

  1. Attempt to connect
  2. If that fails then spawn off VBCSCompiler
  3. Attempt to connect again

The problem with this approach is that you will quickly end up with a number of VBCSCompiler instances that equals the number of processors on the machine. MSBuild will, for sufficently big projects, get to (1) at essentially the same time for all of the leaf projects in the graph. Hence every leaf project will fail (1), then start the server and suddenly there are N instances of the server running.

That causes a couple of problems:

  1. Spawning extra processes defeats one of the optimizations the server provides. That optimization is more relevant for Windows though than Unix.
  2. VBCSCompiler needs to get a bit more complicated. Have to add logic to detect this state and promptly shut down the server when this happens. That was fairly hard to detect reliably IIRC.
  3. Customers complain about us spawning too many unnecessary processes :smile:

We tried a variety of tricks around this and none worked to our satisfaction. Eventually we went with the interprocess mutex because it produced reliably results. The handshake is a bit complex but we also haven't had to substantially change it since we originally implemented it.

kg commented 5 years ago

I experimented with using a file lock as a replacement and got that to work, but there appears to be a dual-purpose thing going on where creating and locking the mutex are used to signal separate pieces of information - if I'm not mistaken, it seems like clients Lock the mutex when attempting to talk to the server, and the existence/non-existence of the mutex is an indicator that the server is running. Is that right?

jaredpar commented 5 years ago

it seems like clients Lock the mutex when attempting to talk to the server, and the existence/non-existence of the mutex is an indicator that the server is running. Is that right?

That's correct. If the mutex exists then we assume the server is running.

jaredpar commented 5 years ago

CC @agocke

kg commented 5 years ago

Can we eliminate the client lock or split it out? "Detect whether the server is running" can be achieved with .Lock(0, 0) on a file on all platforms, as far as I can tell, and abandoned .Lock()s are released on all platforms as well. Abandoned mutexes are a big problem on posix. We'd need a solution for the lock file itself being abandoned, but that seems easier to fix - maybe just putting it in tmp is enough? I've been trying to find a clean way to integrate for-mono/for-other lock handling into the roslyn codebase and it's not especially enormous but the way the mutex serves two purposes makes it tough.

agocke commented 5 years ago

I think we could get away with just a single sync lock to detect if a server is running. One thing @jaredpar forgot to mention: without the lock, the default parallel behavior for MSBuild will require spinning up N msbuild nodes for N threads on your machine -- they will all then race to start the compiler server and a lot of them will succeed and start using that server for compilation requests. This isn't just polluting the system with N extra processes; if all of them start processing requests simultaneously, each compiler server will try to spin up N thread pool threads to compile that request. This leads to massive oversubscription and the compile can often take longer than just not using the compiler server at all.

kg commented 5 years ago

I would be fine with two separate locks (one for 'server is running' and one for 'pipe is in use') if that would be sufficient. It's pretty easy to split that out such that the single Mutex instance is dual-purpose on non-Mono targets and we use two file locks on Mono targets, as long as I understand how the current one is used correctly

tmat commented 5 years ago

@kg My understanding is that the goal for Mono is to provide the same APIs as .NET Core. Is that not the case here? It seems better if the runtime takes the complexity hit rather than passing it onto customers (Roslyn in this case).

kg commented 5 years ago

We previously had interprocess named mutexes and removed them, IIRC for stability and performance reasons. They're very infrequently used and are a primitive that literally doesn't exist in unix, the options are what netcore does (create a single shared shm segment that leaks and contains all your named mutexes in a table) or what mono previously did (create a small shm segment per mutex that leaks, iirc?)

I think if an equivalent primitive existed on unix we'd probably implement them. The netcore approach certainly works and isn't gross but it's a lot of complexity to pick up just for roslyn.

If we knew of other users for this primitive I might be able to make a case for us to implement it.