IBM / node-odbc

ODBC bindings for node
MIT License
146 stars 77 forks source link

[BUG] Pool creates two connections when one is closed in calls to .query #122

Closed markdirish closed 2 years ago

markdirish commented 4 years ago

Describe your system

Describe the bug When a pool connection is closed, 2 connections are created in its place.

Expected behavior When a pool connection closes, only one should be created.

To Reproduce N/A

Code When overwriting the close() function of a connection in a Pool. close calls incrementSize(1): https://github.com/markdirish/node-odbc/blob/master/lib/Pool.js#L70

But in pool.query(), incrementSize(1) is called as well: https://github.com/markdirish/node-odbc/blob/master/lib/Pool.js#L143

Need to remove from query.

Additional context Fixing

dromonaco commented 4 years ago

I think this issue is incorrect. I think when pool.query() calls close it calls the close method of the Connection class---not the overwritten close. I think close is only overridden if the pool.connect() method is called.

markdirish commented 4 years ago

You appear to be right, which makes the code even worse in my opinion. The overriding of close should be done in increasePoolSize, which should be the one-stop shop for populating a Pool with connections. Will try to detangle this weekend.

markdirish commented 4 years ago

@dromonaco , I heard through the grapevine you wanted to talk about pool performance. I worked on pooling a bit tonight. I ironed some thing out, like centralizing the overwriting of close logic and, more importantly, reusing connections (unless reuseConnections: false is passed on the configuration object).

The changes should be on the branch pool-fixes, could you give them a whirl and let me know what you think? I haven't tested them robustly, but they pass all of the existing pool tests.

dromonaco commented 4 years ago

@markdirish Great! I'll try it out tomorrow.

dromonaco commented 4 years ago

Here are my timing results of calling a JavaScript function (running in node on an IBM i partition) that makes 14 calls via node-odbc to the XMLSERVICE stored procedure iPLUGR512K which is used to call QSNDDTAQ and QRCVDTAQ programs.

Branch Description Average time (ms)
pool-fixes Without pooling 150
pool-fixes With pooling and reuseConnections = true 110
pool-fixes With pooling and reuseConnection = false 110
master Without pooling 150
master With pooling 3,000

So the results are much faster than the previous version, 110 ms vs 3000 ms. But I'm not sure what is going on with reuseConnections = false. I would expect that to have the same timings as the previous code on the master branch.

markdirish commented 4 years ago

Certainly seems to be better, but yeah reuseConnections doesn't seem to be doing what I expect. I will test it a bit and write some tests to ensure that reuseConnections = false is doing what I think it should be doing.

markdirish commented 4 years ago

Ok, I think I have it solved in this branch. reuseConnections (and actually every configuration option to the pool) wasn't being passed correctly. @dromonaco , would you mind testing it out again? Your table was really helpful, and you have a much better "real world" test that I do.

I also rewrote large portions of the pool. The worst performance problem in the pool I was still noticing was when multiple calls would try to query on an empty pool. The first would see no connections and fire off 10 new connections. The second would come and also see no connections since none of the 10 had connected, and it would also fire off 10 new connections. And so on.

There is now an event-based system in place, and the system knows how many outstanding connections there are and so it only creates them when there are: no connections available. In fact, all connection distribution now takes place through this event function.

There is one big issue left (though in practice you might not hit it often): I really wish that queries could preempt connections connecting. If firing off a query right after initialization, I can't get it to:

  1. Create 10 connections
  2. When one connection comes back, return from the promise/callback
  3. Fire off a query, let the other 9 connections connect in their time
  4. As soon as the query comes back, handle its callback <------------ Problem step
  5. Some connections may continue trickling in here.

As far as I can tell, the issue is actually node-addon-api (the C++ API for writing the native module), so I will see if I can figure how to do the preempting correctly.

dromonaco commented 4 years ago

@markdirish Great! I will try this tomorrow. I love that you put in code to handle outstanding connections😊. I was seeing the same thing happening and I would end up with way more connections then necessary. Can't wait to try it out!

dromonaco commented 4 years ago

@markdirish I am not seeing the reuseConnections branch.

markdirish commented 4 years ago

Sorry, should be in the pool-fixes branch. My note was phrased poorly.

dromonaco commented 4 years ago

Here are my timing results using the same partition and function that makes 14 calls via node-odbc to the XMLSERVICE stored procedure iPLUGR512K which is used to call QSNDDTAQ and QRCVDTAQ programs.

Branch Description Ave. time (ms)
pool-fixes Without pooling 155
pool-fixes With pooling and reuseConnections = true 110
pool-fixes With pooling and reuseConnections = false 3280
master Without pooling 150
master With pooling 7200

These results are what I would expect to see and look really good to me. Your switch to an event based system is really showing as well. With pooling on and reuseConnections = false on the pool-fixes branch the timings were pretty stable at around 3000 ms each time. The number of jobs in the pool increased to 18 jobs and remained there after subsequent timings. This is MUCH different than the behavior I see on the master branch. On the master branch with pooling on, the number of jobs rose to overs 100 and decreases by about 15 every call until it needs to create all the jobs again. This causes the timings to be very erratic with a high standard deviation. When there are enough existing jobs (more than 15) the timings are about 1500 ms. But when it needs to start creating new jobs the timings are around 25000 ms. The 7200 ms in the table is the average.

markdirish commented 4 years ago

Ok, I have been testing, and it seems like the final issue is really just a constraint of N-API:

All of that to say, this appears to be mostly baked. I will push a beta branch tonight to get these changes on npm for people to use (and test)

Leo137 commented 3 years ago

Hey, First of all, thanks for the effort.

Is the beta branch up to test?

markdirish commented 3 years ago

There is now an alpha branch up for testing. If you want to download it, you can run npm install odbc@alpha and it should download the latest and greatest version with all of these pooling fixes. @dromonaco do you want to test that version out and make sure that you still see all the timing improvements?

dromonaco commented 3 years ago

Thank you Mark! I will be able to begin testing in about 2 days.

dromonaco commented 3 years ago

Here are my timing results of odbc@2.4.0-alpha.0 using the same partition and same function that makes 14 calls via node-odbc to the XMLSERVICE stored procedure iPLUGR512K which is used to call QSNDDTAQ and QRCVDTAQ programs.

Branch Description Ave. time (ms)
pool-fixes Without pooling 173
pool-fixes With pooling and reuseConnections = true 113
pool-fixes With pooling and reuseConnections = false 2754
master Without pooling 170
master With pooling 7852

This looks pretty consistent with the tests I ran on the pool-fixes branch. I'm going to start testing with other stored procedures and SQL queries. I'll let you know if I run into anything.

dromonaco commented 3 years ago

I have now run into the maxSize not implemented problem referenced in #52 and #119. I will work on a fix for this and submit a pull request, unless @markdirish you already have a fix.

markdirish commented 3 years ago

@dromonaco I think I have a fix sitting around. Should be able to push tonight

dromonaco commented 3 years ago

@markdirish In case it helps I submitted a pull request #142 with my changes. I've been testing it for a while today and so far so good.

MadimetjaShika commented 3 years ago

Hi there @markdirish. Firstly, thanks for this lib, great work!

Any chance of getting @dromonaco's PR merged in? Would be really great if we could get the pool connection count "under control" :)

markdirish commented 3 years ago

@dromonaco , could you install the latest beta tag and just double check that the pool still appears to be working as we fixed?

npm install odbc@beta

dromonaco commented 3 years ago

@markdirish , I should be able to get to this either Friday (2021-03-05) or Monday

dromonaco commented 3 years ago

@markdirish We're running into an issue with the beta that we haven't pinpointed yet. If we have 2 connections with the same user profile and one of them hits an error (for example table doesn't exist) the other connection seems to lock up. Have you run into any issues similar to this?

markdirish commented 3 years ago

I haven't. Does the same error exist in 2.3.x download, or does it seem to be specifically on 2.4.0 beta?

dromonaco commented 3 years ago

I don't know about the 2.3.x download but I have confirmed that the same error does not exist in 2.4.0-alpha.0

DavidRusso commented 3 years ago

I'm testing with 2.4.0-beta.1 and having really significant performance issues relating to the pool. I'm using the latest IBM i ACS ODBC driver for both Windows and IBM i.

Take this program for example:

const odbc = require("odbc");

const QUERY = "select table_name from qsys2.systables limit 1";

(async () => {
  try {
    const pool = await odbc.pool({
      connectionString: "Driver=IBM i Access ODBC Driver;System=MY_IBMI;uid=ME;password=MINE;",
      initialSize: 10
    });
    let timer = process.hrtime();
    await pool.query(QUERY);
    timer = process.hrtime(timer);
    console.log(`Query executed in ${timer[0] * 1e3 + timer[1] / 1e6}ms`);
    timer = process.hrtime();
    await pool.query(QUERY);
    timer = process.hrtime(timer);
    console.log(`Query executed in ${timer[0] * 1e3 + timer[1] / 1e6}ms`);
  }
  catch (error) {
    console.error(error);
  }
})();

When I run this program on my Windows PC (connecting to IBM i over WAN) I get a result like this:

Query executed in 7017.1864ms
Query executed in 161.3793ms

So, HUGE delay running the first query. But, if I simply change the pool's initialSize from 10 to 1 and re-run the program I get this result:

Query executed in 513.0996ms
Query executed in 300.28ms

That's a really odd/surprising result, but it happens consistently -- reducing the pool size from 10 to 1 gets rid of that initial long delay. However, the execution times are still a bit slow when compared to running the same query from ACS Run SQL Scripts on the same PC.

The problem seems to happen when I run the program on IBM i and connect to 'localhost', too. In that case I get a result like this for pool size 10:

Query executed in 386.71102ms
Query executed in 3.170812ms

And this with pool size 1:

Query executed in 20.803281ms
Query executed in 3.967185ms

The huge delays are not limited to just the first query, either. This is just a simple example to reproduce the problem. In a more realistic program, I'm seeing the huge delays recur intermittently as the program runs several queries sequentially.

What's really strange about this is that I have some test programs that run many larger queries concurrently (using Promise.all()) and I actually get much better results!

DavidRusso commented 3 years ago

It seems that the behavior I'm seeing is what @markdirish described above about queries getting queued up behind the initial connections. Once the initial connections are established, queries run without delay.

dromonaco commented 3 years ago

@markdirish Sorry for the delay.

Here are my timing results of the old pool-fixes branch version of the code vs the new 2.4.0-beta.1 published version. Unfortunately I was not able to get the function I previously tested with working again but I did these timings with a similar function that make 9 calls, in succession, via node-odbc to the XMLSERVICE stored procedure iPLUGR512K which is used to call QSNDDTAQ program.

Branch/Version Description Ave. time (ms)
pool-fixes Without pooling 8
pool-fixes With pooling and reuseConnections = true 10
pool-fixes With pooling and reuseConnections = false 146
2.4.0-beta.1 Without pooling 12
2.4.0-beta.1 With pooling and reuseConnections = true 10
2.4.0-beta.1 With pooling and reuseConnections = false 142

Everything appears to be working the same as on the pool-fixes branch. The number of QZDASOINIT jobs remains constant once the pool has grown large enough so that fix is there as well.

markdirish commented 3 years ago

@DavidRusso , I think this should finally be fixed in https://github.com/markdirish/node-odbc/pull/191

Turns out my strategy of using Promise.race wasn't really doing anything, and all of the connections are queued up immediately. So the queue of async work to do was filled up with connection creation, and the query (which is encountered in the JavaScript code immediately after the first connection returns) got queued up behind all of that.

NOW, you can use a configuration keyword connectingQueueMax to dictate the maximum number of threads you want connections to take up. The rest can be free for running queries. It defaults to 1, which I have found to be fast enough (its only really an issue on pool creation). Tested it, and it will spin up 10 connections but will return the query sometime between the 2nd and 3rd connecting, resulting in a much greater performance for getting queries run on start up. Will try to merge in ASAP

DavidRusso commented 3 years ago

Nice! Any chance for a new 2.4.0 beta release, if not a normal release? You've fixed a few issues now that I was having some problems with.

markdirish commented 3 years ago

Frantically trying to get a release out sometime tonight :smile:

tomboolean commented 3 years ago

w00t!

On Tue, Jun 1, 2021 at 2:02 PM Mark Irish @.***> wrote:

Frantically trying to get a release out sometime tonight 😄

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/markdirish/node-odbc/issues/122#issuecomment-852442944, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABU6Y2M3HAQOU4JAQQOK4NDTQVDIBANCNFSM4RL6PKNA .

--

Ana Tomboulian

*CTO | *peerlogic

480 | 561.1554

@.*** | https://www.linkedin.com/in/anatomboulian/

markdirish commented 3 years ago

There should be a new beta release available now. npm install odbc@beta, the new version should be 2.4.0-beta.2!

DavidRusso commented 3 years ago

I re-tested query performance while the pool is initializing in 2.4.0-beta.2. Works great now, thanks!

markdirish commented 2 years ago

This was a while ago, haven't had many pool issues lately so going to close!