coopernurse / node-pool

Generic resource pooling for node.js
2.38k stars 259 forks source link

Performance issues in 3.1.5 vs 3.1.4 #172

Closed sqrtofsaturn closed 7 years ago

sqrtofsaturn commented 7 years ago

Sorry for the vague issue, but we are experiencing major performance slow downs in 3.1.5 where they didn't exist in 3.1.4. They are causing some of our time sensitive tests to fail with this new version of generic-pool.

Is anyone else experiencing this problem?

sandfox commented 7 years ago

This is most likely caused by this commit https://github.com/coopernurse/node-pool/commit/a4370b64b449990d993db4f6da46ddd2125c1958 which means that "shutting down" pool will probably take longer than before, because it's now waiting for borrowed resources to be returned (which is the what it should have done all along).

Would this explain what your seeing? Out of interest what sort of times differences are you seeing?

sqrtofsaturn commented 7 years ago

That commit does appear to be the source of the slow-down, we are seeing 25.62s for tests using .values() compared to 4.82s using .values with https://github.com/octoblu/meshblu-core-job-manager

sandfox commented 7 years ago

The code in version 3.1.4 was faulty (hence the speed), when shutting down the pool it would not wait for resources to be returned, so the promise from drain or clear would resolve prematurely. This may not cause any problems in your tests but it could result in borrowed resources "hanging around" longer than expected in production. 3.1.5 is definitely the version you should be using for "correct" behaviour, but that said I'm interested to know what I could do to make it faster for you. At a guess there are probably some config settings for the pool which would help.

sandfox commented 7 years ago

ok, I've dug into this further and I see what is happening in your tests.

After each test you are draining the pool when you close the JobManagerResponder / JobManagerRequester.

As mentioned previously, when using version 3.1.4, the drain command incorrectly doesn't wait around for outstanding resources and stops nearly immediately, so your entire test suite runs in a few seconds. Once you use version 3.1.5 the pool now waits for outstanding resources to be returned (which in this case takes around a second).

It turns out that after most of your tests, the JobManagerRequester has an outstanding connection to redis that waits for queueTimeoutSeconds before being returned to the pool. As that setting is by default 1 second and your tests take about an extra 20 seconds to run and you have about 20 tests, it would seem to account for the test slow down.

If you want to make the tests fast again you can either not bother waiting for the pool to drain after each test (which is sort of what was happening in 3.1.4, or you can add someway in your code to cancel that outstanding queueTimeoutSeconds.

Either way - there isn't anything I can do here to help in generic-pool so I'm going to close this issues. Hopefully what I've written above is helpful. Let me know if anything isn't clear in it.