alexscheelmeyer / node-phantom

bridge to PhantomJS from Node.js
317 stars 117 forks source link

Many phantom child processes never close #85

Open matt2000 opened 10 years ago

matt2000 commented 10 years ago

Great module, thanks for sharing it!

I'm using it in an app that spawns many phantom tasks, i.e., multiple calls to phantom.create(), but I'm seeing that many/most/perhaps-all of the underlying phantomjs processes are not ending, so of course the machine eventually runs out of memory and crashes.

Each phantom task does a page.open and page.evaluate, and I call phantom.exit() in the callback to the evaluate().

Any idea what might cause the processes to hang around? Any suggestions for dealing with this besides occasionally counting the number of running phantomjs processes and killing the oldest ones?

Here's my coffeescript code:

    callPhantom = () ->
        phClient = (err,ph) ->
          if err
            ph.exit()
            sender(err, 500, 'Error creating Phantom Client.')
            return
          return ph.createPage (err,page) ->
            if err
                ph.exit()
                sender(err, 500, 'Error creating Phantom Page.')
                return
            page.open "http://" + host + ":"+ port + url, (err, status) ->
              if err
                ph.exit()
                sender(err, status, "Error opening Phantom Page: #{url}\n[Got HTTP #{status}]")
                return

              getHTML = () ->
                return document.documentElement.innerHTML

              respond = () ->
                  result = page.evaluate getHTML, (err, data) ->
                    ph.exit()
                    my.setHeaders(res, 'Phantom')
                    code = if err then 500 else 200
                    sender(err, code, data)

              setTimeout respond, 1000 # @todo - Reconsider if we can detect completion, so we dont have to use the Timeout.

        phantom.create phClient, {
            parameters: {'load-images':'no'},
        }
joegoldbeck commented 10 years ago

I just came across this article, which suggested killing the phantom process after sending the exit command. So I wrote a little something for my own project to do just that. Not tested in prod, but seems to work on my dev machine just fine.

ensureExit = (ph) ->
    pid = ph._phantom?.pid
    ph.exit()
    if pid
        setTimeout -> # wait and then kill it in case it failed to close
            try # would otherwise error if exit had succeeded
                process.kill pid
        , 500

Let me know how it works for you. If it seems to be working out, it's probably worth a PR.

edit: noticed that process.kill(null) kills the current entire node process, so added a check for pid.

joegoldbeck commented 10 years ago

If that still leaves phantom processed hanging around, you could also try process.kill pid, 'SIGKILL'. For now, I'm just planning on sending the default 'SIGINT', but I'm not up at scale yet, so please let me know if you find that's necessary!

matt2000 commented 10 years ago

Hmm. This may be paranoia, but I'm mildly concerned about the very remote possibility of the pid being reused by another process that started in the last 500ms, so that we are killing another process, rather than the phantom process that used to have the same pid. But child_process.kill() has this same risk according to the NodeJS docs, so maybe the risk can't be avoided.

matt2000 commented 10 years ago

In fact, is there actually any need to use the timeout? Why not just SIGINT as soon as exit() returns?

joegoldbeck commented 10 years ago

That thought occurred to me too. Certainly you could make a shorter timeout to lower the likelihood of that happening. 500 ms was an arbitrary choice.

joegoldbeck commented 10 years ago

I can tell you that if you just call ph.exit() and then process.kill, then the SIGINT gets there before the process is closed "naturally" by ph.exit(). I have no idea whether ph.exit() causes a more graceful exit than SIGINT, but it seemed possible.

matt2000 commented 10 years ago

Looking at the node-phantom code, it seems exit() accepts a callback, so my next try will probably be
ph.exit(-> process.kill(pid))