coopernurse / node-pool

Generic resource pooling for node.js
2.38k stars 259 forks source link

A way to handle unexpected resource failure in available object pool #155

Open mikeys opened 7 years ago

mikeys commented 7 years ago

I'm wondering if there's currently a best practice for destroying a resource which 'dies' unexpectedly (by kill pid for example).

I've tried the following:

let factory = {
  create: () => {
    return new Promise((resolve, reject) => {
      log.info('Spawning a new client...')

      let client = new Client()

      errorHandlers[client] = () => {
        log.error(`Client ${client.pid} quitted unexpectedly...`)
        pool.destroy(client)
      }

      client.once('fatal-error', errorHandlers[client])

      client.init()
        .then(client => {
          log.info(`Spawned a new client (pid: ${client.pid})`)
          resolve(client)
        }).catch(err => {
          log.info('Failed to spawn a new client.')
          reject(err)
        })
    })
  }

But this yields: UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Resource not currently part of this pool.

The reason why validate and testOnBorrow won't work for me is because spawning a new resource is very expensive (approx 10 seconds), therefore I would like to start spawning a new resource as soon as possible.

Thanks and keep up the great work! Miki

sandfox commented 7 years ago

When you says the resource dies? When/where exactly are you thinking of? during the factory.create call, or whilst it's sitting idle in the pool waiting for something to borrow it (or somewhere else altogether)?

Whilst I don't have anything concrete, I do have a vague plan to add some flag/option like testWhileIdle which would allow the idle object evictor to also validate each object as checks it. Would this in any way help?

Also - do you have some example code which uses the above pool/factory?

mikeys commented 7 years ago

@sandfox Thanks for responding. I'm talking about a situation when the resource dies unexpectedly, when it's sitting idle. For example - my resource is a spawned child process (which I create as part of the factory.create method). The process itself has a long initialisation process (approx 10-15 seconds), only after the init process it's considered as ready (and the pool create is resolved).

Obviously, a child process can die unexpectedly for various reasons. I'm using child.on('error') and child.on('exit') in order to figure out if a child died unexpectedly or had a fatal error. The above fatal-error event which I'm listening to is just for that, and was supposed to destroy the resource and remove it from the pool as soon an error occurs.

Additionally, I have a validate function as part of the factory which looks like this: validate: client => Promise.resolve(client.isActive). client.isActive is being set (in the client class implementation) to false when one of the above error event occurs.

If the resource is considered as idle by the POV of the pool, I wouldn't want to wait until a pool.acquire in order to recreate an additional resource instead of the one which just died (after the pool invalidates the resource), since the amount of time it will take is very expensive.

Hope this makes more sense.

sandfox commented 7 years ago

Ah-ha, got it now! Thanks for the extra info - I'll try and have a better look at this tonight and write out a more concrete plan for testWhileIdle which I think might help your situation

In the meantime I can tell you that pool.destroy in your code above fails because the pool explicitly tracks resource borrowing and allows resources to to be destroyed/released only if they were acquired (although the error message here could be better!)

Unfortunately there isn't anything in the way of a private API you could misuse for this either (they all expect/use PooledResource which is an internal wrapper around the user created resource). There is also the problem that the queue of idle resources is not designed for random access so even given a know bad resource there isn't a quick way to pull it out.

mikeys commented 7 years ago

@sandfox Care to share some implementation details for testWhileIdle? I think that by design - it should be possible to destroy a resource on demand. It's more performant to destroy a resource once an event is triggered as opposed to polling the resource and checking if it's valid every X seconds. Much less load on the event loop. What do you think?

sandfox commented 7 years ago

The current plan for test testWhileIdle is that would trigger a validation check everytime it visits a resource, I held off from implementing it this time round as the current evictor is synchronous, whereas validators are snot, it also avoided adding more (complex) states to resources and making decisions around various behaviours. (these still need to be done, but I'm holding out till they are asked for)

Whilst adding testWhileIdle is probably the correct-ish solution to this, you are right that it would add more load doing constant eviction checks (although I think this would often be negligible for many use cases)

Another approach (which I sort of favour more) could be using proper dependency injection for the Pool class so it's easier for user's to inject their own queues/etc. The original intention was to still expose the Pool constructor as done in the original v2 of this library. I had a last minute change of mind to export a factory function instead but didn't get round modifying the Pool constructor.

Off the top of my head I think you might be able to do something similiar to lib/Queue.js where it attaches a promise handler to every request and if the promise rejects, it then removes the request from itself. If a similar mechanism (some kind of event listener) was implemented on something that extended lib/DLLArray.js it could easily be passed into a improved Pool constructor.

You'll probably notice that Queue and DLLArray have ended up being very similar and probably could be merged or at least one could extend the other. I suspect I probably want them to both be double ended queues but they evolved at different times.

I'll open another issue to modify the pool constructor and use the opportunity to consolidate the duplicate functionality in Queue and DLLArray...

sandfox commented 7 years ago

Ok, I've created v3.1.0 which exposes the Pool constructor It's a little ugly and clunky to work with...

const genericPool = require('generic-pool')
const DefaultEvictor = genericPool.DefaultEvictor
const Deque = genericPool.Deque
const PriorityQueue = genericPool.PriorityQueue

const myPool = new Pool(DefaultEvictor, Deque, PriorityQueue, factory, config)
mikeys commented 7 years ago

@sandfox Thanks, that's awesome for the time being. Appreciate your awesome job.

bazineta commented 7 years ago

We have a somewhat similar situation in that our pooled objects can (well, are expected to, really) die as a result of inactivity timeout on the remote end.

As such, we're also dependent on the validate function, which is in our case just returning tlsSocket.writable, to determine that the other end went away since last we talked to it, and the resource should be destroyed.

The issue we just ran into is that while it's obvious from reading the new documentation that the validate method won't be called unless testOnBorrow is true, TBH the migration guide would benefit from that being made more obvious there as well.

sandfox commented 7 years ago

I have no idea what the best way to do it is, but I'd like to make some generalised solution around resources that can "fail" (and notify something about it) whilst in the available resource pool. EventEmitters seem a bit heavy, and promises seem the wrong fit too, but I'll think harder about it some time later.

Yeah, the migration guide has lagged a little behind actual development and needs some more tidy up. Maybe I should bring it into it's own repo rather than being a gist.

I'll try to make the defaults (and their effects) more explicit

bazineta commented 7 years ago

I'm not sure what a generalized solution would look like, either, other than using a mark and sweep as we do in this example here. Basically, here, should the TLS socket fail at some point after connection, we call end() on it to close our end of it. The validate() routine then can check the writable state and know that it needs to be destroyed.

This was the best I was able to come up with. Given the constraints, it's not obvious to me what other solutions there might be other than what this does, which is basically flagging it for the future as dead.

makePool = (host) ->

  pool = generic.createPool

    create: ->

      return new Promise (resolve, reject) ->

        broker = tls.connect _.extend
          host: host
        ,
          config.broker.tls

        done = (err) ->

          broker
          .removeListener 'secureConnect', done
          .removeListener 'error',         done

          if err
            reject err
          else
            broker.on 'error', -> broker.end()
            resolve broker

          return

        broker
        .once 'secureConnect', done
        .once 'error',         done

        return

    destroy: (broker) ->

      return new Promise (resolve) ->
        broker.end()
        resolve()
        return

    validate: (broker) ->

      return new Promise (resolve) ->
        resolve broker.writable
        return
  ,
    config.broker.pool
mikeys commented 7 years ago

@bazineta That's more or less my solution right now, but still, it's not ideal.

bazineta commented 7 years ago

It ends up, I think, being essentially a state machine, since we have in general:

  1. Not started
  2. Start failed
  3. Running
  4. Failed
  5. Doomed

It's that fourth state that, absent setting a flag and validating, there seems no inherent way to address at present.

mikeys commented 7 years ago

@sandfox EventEmitters are heavy in what sense?

dustinbolton commented 7 years ago

Could 'create' just provide a parameter that could be called if the resource dies, instructing node-pool that the resource instance is no longer valid? I am making FTP connections and when there is a timeout emitted if the server goes away then it could be good to destroy this instance if I'm wanting to keep a connection held open.

motleycrew commented 7 years ago

@sandfox did you find a solution to your problem. I am using generic pool to create phantomjs instances, but they often hang forever so I have to manually call pool.destroy. I am also getting this error:

node:26831) UnhandledPromiseRejectionWarning: Unhandled promise rejection (rejection id: 1): Error: Resource not currently part of this pool (node:26831) DeprecationWarning: Unhandled promise rejections are deprecated. In the future, promise rejections that are not handled will terminate the Node.js process with a non-zero exit code. here is the code

  var pageLoadTimeout=setTimeout(function() {
    var pid = instance.process.pid;
    console.log(timeout + 's timeout occurred. killing phantom pid: '+  pid + '  Url: ', srcUrl);
    instance.kill();
    self.pool.destroy(instance);
    return reject(results);
     },timeout*1000);
ghost commented 7 years ago

Kind of facing the same issue. I'm using sequelize which takes a dependency on 'generic-pool' to manage connections. However, I'm able to set the pool options that are passed through. @sandfox what happens if pool.destroy gets called with a dead connection? Will the pool invalidate the object and simply create a new one?

sandfox commented 7 years ago

@chrwend as long the object passed to pool.destroy came from a pool.acquire the pool will accept it and "destroy" it (where destroy is whatever the destroy function, that was passed to the pool's constructor, does). The pool itself doesn't care about whatever internal state the resource is in, and only concerns itself with things like tracking the existence and borrowing of the objects/resources

ghost commented 7 years ago

@sandfox That's odd. My min pool size is set to 10. However, I can verify that there sometimes less than 5 connections. Once a connection is borrowed, the current connection number is increased to 11. Any idea?

sandfox commented 7 years ago

Where are your connection counts coming from? metrics from the database itself? The pools reported size?

without any code or stuff to reproduce this I'm flying a little blind, but I can throw some guesses about.

1) definitely shouldn't rule out there being a bug in generic pool! 2) because resource creation/destruction is async there can be periods where a resource is being destroyed but no new resource to takes it's place has yet been created. The pool does track create/destroy operations but I can't remember if it exposes them as a metric at all.

At the moment the observability of the pool is not great and I'm contemplating adding a raft of events for things like resource created/destroyed/tested/borrowed/returned so library users can optionally hook into them to create their own logs etc, but I also need to check this doesn't do horrible things to performance etc.

t3hmrman commented 5 years ago

Is this still an issue? it seems like destroy got added, and if someone has a minimum set, a destroy should trigger a new creation if the minimum is breached... There's another issue about minimums not being respected, but I'm not sure if any of the things in this issue are still present