Ycmd periodically fails, multiple Ycmd servers running for a single Atom project/folder

joeroback commented 8 years ago

Editing for a while, occasionally I will receive an Ycmd error like connection refused. Things continue though. When I look at all the Ycmd processes running, the number of them just keeps increasing...

$ ps ax | grep ycmd
67419   ??  Ss     0:01.98 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python /Users/jroback/Library/bin/ycmd/ycmd --port=53778 --options_file=/var/folders/1x/m5w5ghsj7cjgt13m2ss03rl40000gn/T/AtomYcmOptions-1463573610032 --idle_suicide_seconds=600
67622   ??  Ss     0:00.16 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python /Users/jroback/Library/bin/ycmd/ycmd --port=54538 --options_file=/var/folders/1x/m5w5ghsj7cjgt13m2ss03rl40000gn/T/AtomYcmOptions-1463574214989 --idle_suicide_seconds=600
67624   ??  Ss     0:00.18 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python /Users/jroback/Library/bin/ycmd/ycmd --port=54542 --options_file=/var/folders/1x/m5w5ghsj7cjgt13m2ss03rl40000gn/T/AtomYcmOptions-1463574215929 --idle_suicide_seconds=600
67625   ??  Ss     0:00.19 /opt/local/Library/Frameworks/Python.framework/Versions/2.7/Resources/Python.app/Contents/MacOS/Python /Users/jroback/Library/bin/ycmd/ycmd --port=54551 --options_file=/var/folders/1x/m5w5ghsj7cjgt13m2ss03rl40000gn/T/AtomYcmOptions-1463574218559 --idle_suicide_seconds=600
67864 s001  S+     0:00.00 grep --color=auto ycmd

For small projects this is trivial but for large projects, Ycmd memory usage gets high and with the suicide timeout being 10 minutes, its not really viable if one keeps using the editor continuously..

Qusic commented 8 years ago

I just updated the package with a fix. Please see if it works for you.

joeroback commented 8 years ago

closer, but in handler.reset on Mac at least, the process parameter is a promise object, not a bufferedprocess object...

reset = ->
  realReset = (process) ->
    alert ycmdProcess
    process?.kill?()
    ycmdProcess = null
    port = null
    hmacSecret = null
  Promise.resolve ycmdProcess
    .then realReset, realReset

joeroback commented 8 years ago

so the call the kill?() is never executed and the ycmd process remains... this doesn't seem to happen to me on Linux... Atom, 1.7.3 on both...

joeroback commented 8 years ago

If I do this on Mac OSX

reset = ->
  realReset = (process) ->
    alert process                                                                                                                                                                                                  
    process?.kill?()
    ycmdProcess = null
    port = null
    hmacSecret = null
  Promise.resolve ycmdProcess
    .then realReset, realReset

Just add an 'alert process' into reset, it Alert's [Object object] and ycmd process always is killed on Atom exit. If I remove the alert, the ycmd process always, 100%, remains... weird

joeroback commented 8 years ago

ok, actually happens to me on linux as well. It seems if I re-open and it restores a previously open C++ file on startup, the ycmd process remains. If I close all tabs, then restart atom. I see no ycmd process. If I open the C++, ycmd comes online. Quiting Atom at this point kills the ycmd process.

If I now reopen Atom again, it will restore that window with the C++ file open on startup, ycmd server will start. Then I exit Atom and ycmd remains...

Something to do with how the process is initialized if Atom was a C++ file loaded on startup?

joeroback commented 8 years ago

the problem is with handler.prepare() and handler.reset()

prepare() is called as part of the chain to handler.request(), but handler.reset() is also called in the event observer

observeConfig = ->                                                                                                                                                                  
  atom.config.observe 'you-complete-me', (value) ->
    handler.reset()

When Atom is restoring C++ files on startup, there is a race and prepare() is called first. While waiting for the setTimeout to trigger, the observeConfig event is fired, calling reset(). This obviously causes issue.

As a test, I just commented out the handler.reset() in the observeConfig event and both platforms start and stop ycmd correctly, 100% of the time. Of course, what is lost is the ability to reset the server if the config changes...

joeroback commented 8 years ago

Doing this seems to be working...

diff -Naur atom-youcompleteme/lib/handler.coffee /home/jroback/.atom/packages/you-complete-me/lib/handler.coffee
--- atom-youcompleteme/lib/handler.coffee   2016-05-18 15:15:50.540490063 -0600
+++ /home/jroback/.atom/packages/you-complete-me/lib/handler.coffee 2016-05-18 15:15:11.413401716 -0600
@@ -83,16 +83,14 @@
     .then startServer

 prepare = ->
-  ycmdProcess ?= launch -> ycmdProcess = null
+  if not ycmdProcess?
+    launch(-> reset).then (process) -> ycmdProcess ?= process

 reset = ->
-  realReset = (process) ->
-    process?.kill?()
-    ycmdProcess = null
-    port = null
-    hmacSecret = null
-  Promise.resolve ycmdProcess
-    .then realReset, realReset
+  ycmdProcess?.kill()
+  ycmdProcess = null
+  port = null
+  hmacSecret = null

 request = (method, endpoint, parameters = null) -> prepare().then ->
   generateHmac = (data, encoding) ->

joeroback commented 8 years ago

ok if its not obvious, that doesn't work either. the real problem is the async'ness of the setTimeout in startServer. seems like you either want a singleton like pattern with the handler, or some indication that start is pending, queue promises to fulfill once the startServer promise is fulfilled from setTimeout...

even setTimeout() should probably loop using the /healthy endpoint of ycmd.

joeroback commented 8 years ago

Here is the solution I came up with. Flag to indicate there is a pending launch. If pending, put all requests (deferred promises) into an array that will resolved as part of the final fulfillment of the launch.

This situation happens when Atom is launched and is restoring a window with multiple C++ files open. It will attempt to "launch" as many ycmd processes as it can within the window of "setTimeout".. The following patch should prevent that.

diff --git a/lib/handler.coffee b/lib/handler.coffee
index 1b7f5bc..2ccff27 100644
--- a/lib/handler.coffee
+++ b/lib/handler.coffee
@@ -11,6 +11,8 @@ url = require 'url'
 utility = require './utility'

 ycmdProcess = null
+ycmdProcessPending = false
+ycmdProcessPendingPromises = []
 port = null
 hmacSecret = null

@@ -57,7 +59,7 @@ launch = (exit) ->
         reject error

   startServer = (optionsFile) -> new Promise (fulfill, reject) ->
-    process = new BufferedProcess
+    ycmdProcess = new BufferedProcess
       command: atom.config.get 'you-complete-me.pythonExecutable'
       args: [
         path.resolve atom.config.get('you-complete-me.ycmdPath'), 'ycmd'
@@ -76,23 +78,36 @@ launch = (exit) ->
           when 5 then reject new Error 'YCM core library compiled for Python 3 but loaded in Python 2. Set the Python Executable config to a Python 3 interpreter path.'
           when 6 then reject new Error 'YCM core library compiled for Python 2 but loaded in Python 3. Set the Python Executable config to a Python 2 interpreter path.'
           when 7 then reject new Error 'YCM core library too old; PLEASE RECOMPILE by running the install.py script. See the documentation for more details.'
-    setTimeout (-> fulfill process), 1000
+
+    setTimeout (->
+      ycmdProcessPending = false
+      pendingPromise.resolve() for pendingPromise in ycmdProcessPendingPromises
+      ycmdProcessPendingPromises = []
+      fulfill()), 1000

   Promise.all [findUnusedPort, generateRandomSecret, readDefaultOptions]
     .then processData
     .then startServer

 prepare = ->
-  ycmdProcess ?= launch -> ycmdProcess = null
+  if ycmdProcess is null and not ycmdProcessPending
+    ycmdProcessPending = true
+    launch reset
+  else if ycmdProcessPending
+    pendingPromise = Promise.defer()
+    ycmdProcessPendingPromises.push pendingPromise
+    pendingPromise.promise
+  else
+    Promise.resolve()

 reset = ->
-  realReset = (process) ->
-    process?.kill?()
-    ycmdProcess = null
-    port = null
-    hmacSecret = null
-  Promise.resolve ycmdProcess
-    .then realReset, realReset
+  ycmdProcessPending = false
+  pendingPromise.reject new Error 'YCM reset.' for pendingPromise in ycmdProcessPendingPromises
+  ycmdProcessPendingPromises = []
+  ycmdProcess?.kill()
+  ycmdProcess = null
+  port = null
+  hmacSecret = null

 request = (method, endpoint, parameters = null) -> prepare().then ->
   generateHmac = (data, encoding) ->

Qusic commented 8 years ago

what problem are you trying to fix? currently the promise from launch() is reused and all pending requests should be handled correctly.

Qusic commented 8 years ago

https://github.com/atom/atom/issues/7252 https://github.com/atom/atom/issues/8294 Found two interesting issues.

joeroback commented 8 years ago

if i open atom and it restores multiple C++ files, multiple ycmdProcess attempt to launch, best case as it is without my changes, are multiple ycmd exist where only the last launch one is used. also exiting Atom, ycmd doesn't exit without my changes.

joeroback commented 8 years ago

with my last patch, I've used Atom all day today at work with multiple C++ projects, one really large one, and not a single problem. ycmd correctly stops and starts without any ECONNREFUSED errors, etc, on startup. But if you have a large C++ project and open Atom with say 3-10 tabs open, Atom YouCompleteMe currently blows up pretty hard on both OSX and Linux.

joeroback commented 8 years ago

atom/atom#7252 is not interesting here, because there are no child processes. The python process launched is the only process involved. The problems come from the Atom YouCompleteMe code, when launch is called multiple times at startup, the ycmdProcess variable gets in a corrupt state, so when handler.reset() is called in deactivate(), its pointing to a Promise (sometimes), instead of a BufferedProcess, so the call to process?.kill?() does nothing because kill() isn't a valid Promise method.. nor should it be.. and the ycmd process is not terminated on deactivate.

joeroback commented 8 years ago

and these are all race conditions that are increased because launch uses setTimeout.. which should be obvious why, since many launch attempts can happen in the 1 second setTimeout is waiting for.

joeroback commented 8 years ago

My changes will add a "pending" state, and queue up all prepare() calls while waiting for launch to finish, then resolve all promises when setTimeout in launch fires. From that point on, ycmdProcess != null and ycmdProcessPending == false, so prepare() just falls through with Promise.resolve()

joeroback commented 8 years ago

And as an example on a large project, multiple ycmds hanging around is not good..

$ ps ax | grep ycmd     
24129 ?        Ssl    0:12 /usr/bin/python /home/jroback/Downloads/ycmd/ycmd --port=45008 --options_file=/tmp/AtomYcmOptions-1463688989385 --idle_suicide_seconds=600
$ cat /proc/24129/status | grep '^VmRSS'
VmRSS:    341060 kB

That one instance of ycmd uses 341MiB of resident memory. If you get 3,4,5 of them because Atom restored a bunch of files, you have to wait 10 minutes for suicide to get 1+ GiB of memory back...

joeroback commented 8 years ago

Also,

prepare = ->
  ycmdProcess ?= launch -> ycmdProcess = null

If multiple prepare()'s are called while the first launch is in setTimeout(), they all don't use the same launch promise. ie. its not reused. I am unsure of your statement there really.

Since ycmdProcess is not set until setTimeout() invokes fulfill, they all launch ycmd processes separately.

Qusic commented 8 years ago

you can see it compiles to

prepare = function() {
  return ycmdProcess != null ? ycmdProcess : ycmdProcess = launch(function() {
    return ycmdProcess = null;
  });
};

joeroback commented 8 years ago

ok

prepare = function() {
  return ycmdProcess != null ? ycmdProcess : ycmdProcess = launch(function() {
    return ycmdProcess = null;
  });
};

I still think that is wrong. as the when multiple prepare()s are called, ycmdProcess will be null until the first launch() returns (after setTimeout). launch() will be called multiple times. its 100% reproducible for me. All I have to do is re-open a project with multiple CPP files open and it will launch multiple YCMD processes...

Qusic commented 8 years ago

startServer is actually called only once here. I have no idea what's going wrong.

joeroback commented 8 years ago

I would agree. Sorry about that. I think I may have been looking at the git repo in another directory, while running the older version in Atom, thinking I was running the latest version. This seems to work now, even with larger C++ files and projects.

Qusic / atom-youcompleteme

Ycmd periodically fails, multiple Ycmd servers running for a single Atom project/folder #59