TimelordUK / node-sqlserver-v8

branched from node-sqlserver, SQL server driver compatible with all versions of Node
Other
140 stars 43 forks source link

Sleeping connection blocking and entire node hangs #241

Closed lroal closed 11 months ago

lroal commented 2 years ago

We have a node server receiving a lot of requests every 5th minute. On average every 24th hour this ends up in a situation where the entire event loop is blocked. The node process is still running but unable to do anything. (It is not even capable to serve plain requests that has nothing to do with the database.) The cpu is zero and memory usage low. When checking with sp_who2 I can see that a sleeping Spid is blocking for others. If i kill it, everything turns out normal again. Se attached picture below. I am using transactions with commit / rollback. I am also using a variant of generic-pool with a size of 10. The clients are not immedietely closed, but kept open by generic-pool for a while.

This is very strange. I am using the library https://github.com/alfateam/rdb (I am the author) . I have been using it for 10 years against postgres without any problems. image

We are using node 16. But this also happens on node 12.

lroal commented 2 years ago

It could be related to winston logger blocking the process. I have deactivated the winston logger just to eliminate. Waiting for what is gonna happen tonight. https://github.com/winstonjs/winston/issues?q=is%3Aissue+hang+is%3Aopen https://github.com/winstonjs/winston/issues/1364

lroal commented 2 years ago

It was still blocking tonight with winston logger deactivated.

lroal commented 2 years ago

I tried replacing generic-pool with https://github.com/vincit/tarn.js/ , but blocking still occurs. It could be related to my own code, but I just can't see how...

lroal commented 2 years ago

The sleeping/blocking connection is in the middle of a transaction and is supposed to receive a commit. But it never happens. It seems to be some kind of (dead)lock in node. After 20 seconds this code hangs. It uses a pool of size 10:

import { v4 } from 'uuid';
import rdb from 'rdb';

rdb.log(console.log)
let db = rdb.mssql('server=localhost;Database=rems2;Trusted_Connection=Yes;Driver={ODBC Driver 17 for SQL Server}', {size: 10});

async function doQuery() {
    return db.transaction(async () => {
        let query = {
            sql: "INSERT INTO chargingpoint_status (chargingpoint_status_guid,chargingpoint_guid,actor_cp_id,clock_offset_ms,ups_active,cabinet_closed,temperaturealarm,statusfilename,tariffilename,aitfilename,system_restarts,percentage_image_space_used,percentage_tr_space_used,last_tr_file_transfer,last_pic_file_transfer,pic_tr_missing_picture,createtime)  OUTPUT INSERTED.chargingpoint_status_guid,INSERTED.chargingpoint_guid,INSERTED.actor_cp_id,INSERTED.clock_offset_ms,INSERTED.ups_active,INSERTED.cabinet_closed,INSERTED.temperaturealarm,INSERTED.statusfilename,INSERTED.tariffilename,INSERTED.aitfilename,INSERTED.system_restarts,INSERTED.percentage_image_space_used,INSERTED.percentage_tr_space_used,INSERTED.last_tr_file_transfer,INSERTED.last_pic_file_transfer,INSERTED.pic_tr_missing_picture,INSERTED.createtime VALUES (?, null,?,8,0,1,0,?,?,?,0,1,0,'2022-04-05T09:59:27.973','2022-04-05T10:11:48.07',0,'2022-04-05T10:20:56.140')",
            parameters: [v4(), 100140203, 'obustatusfile_100140_20220405_77.dat', 'A_tariffile_100007_20190625_01.dat', 'AIT2000002022010401_000000_130001']
        };
        let query2 = { sql: 'select top 10 * from chargingpoint_status order by chargingpoint_status_guid DESC' };

        await rdb.query(query);
        await rdb.query(query2);
    });
}

async function iterate() {
    for (let i = 0; i < 50000; i++) {
        doQuery();
    }
}

iterate();
lroal commented 2 years ago

This simple example will reproduce the hang after seconds. Run sp_who2 and you will see a sleeping connection blocking for the others.

Table: CREATE TABLE _customer (id UNIQUEIDENTIFIER PRIMARY KEY default NEWID(), name VARCHAR(256))

stress.mjs:

import mssql from 'msnodesqlv8';

const connStr = 'server=localhost;Database=rdbDemo;Trusted_Connection=Yes;Driver={ODBC Driver 17 for SQL Server}';

function run() {
    mssql.open(connStr, function (err, conn) {
        if (err)
            throw err;
        else {
            iterate(conn);
        }
    });
}

async function iterate(conn) {
    console.log('start')
    await runQuery(conn, 'BEGIN TRANSACTION');
    await runQuery(conn, 'INSERT INTO _customer (name) OUTPUT INSERTED.id,INSERTED.name VALUES (?)', ['foo']);
    await runQuery(conn, 'select top 10 * from _customer order by id DESC');
    await runQuery(conn, 'COMMIT');
    console.log('done')

    setTimeout(() => iterate(conn), 100);
}

async function runQuery(conn, query, params = []) {
    return new Promise((resolve, reject) => {
        conn.query(query, params, (err, res) => {
            if (err)
                reject(err);
            else
                resolve(res);
        });
    });
}

for (let i = 0; i < 10; i++) {
    run();        
}
TimelordUK commented 2 years ago

thanks for above example. The first step will be to see if I observe the same behavior.

These kinds of problems can be extremely hard to find- the cpp is running on worker threads which marshal back onto the main Node JS thread.

I can make no promises on progress but will hope to report back having run this example -what happens for example if only 1 connection is open and we iterate on that.

anyway i will make a start shortly

lroal commented 2 years ago

If you run the sample with only one connection, everyting works fine. (And if you run multiple node programs with one connection each - everything works fine.)

Thank's for your effort so far. I am already using your lib at my employer - but currently I can only use one connection in the pool due to this problem. I think your library is great, actually superior to others, and this is the only obstacle left (for me).

TimelordUK commented 2 years ago

interesting first observation

using select outside the transaction then I do not see a hang up - at least up to 7000 iterations over 10 connections It will hang if select is inside transaction.

start [1] counter = 7054
done [2] counter = 7054
done [0] counter = 7054
done [6] counter = 7054
done [5] counter = 7054
done [7] counter = 7054
done [4] counter = 7054
done [3] counter = 7054
done [8] counter = 7054
done [1] counter = 7054
const mssql = require('msnodesqlv8')
const { GetConnection } = require('./get-connection')

const connectionString = new GetConnection().getConnection('linux')
console.log(`connectionString = ${connectionString}`)

async function run (id) {
  const conn = await mssql.promises.open(connectionString)
  iterate(id, conn)
}

let counter = 0
async function iterate (id, conn) {
  console.log(`start [${id}] counter = ${counter}`)
  try {
    await conn.promises.query('BEGIN TRANSACTION')
    await conn.promises.query('INSERT INTO _customer (name) OUTPUT INSERTED.id,INSERTED.name VALUES (?)', ['foo'])
    await conn.promises.query('COMMIT')
    await conn.promises.query('select top 10 * from _customer order by id DESC')
  } catch (e) {
    console.log(`[${id}] error = ${e}`)
  }
  console.log(`done [${id}] counter = ${counter}`)

  setTimeout(() => {
    ++counter
    iterate(id, conn)
  }, 100)
}

for (let i = 0; i < 10; i++) {
  run(i)
}
lroal commented 2 years ago

Yes, that is the problem. It will hang because the select query is waiting for the INSERT to be commited. But the code is not able to commit. Maybe the code expects some kind of special sequence of the results from the database ? A kind of dead lock on the client side ? The database itself is not in a dead lock situation. Just run sp_who2 and you will see.

TimelordUK commented 2 years ago

using test code as follows

const mssql = require('msnodesqlv8')
const { GetConnection } = require('./get-connection')

const connectionString = new GetConnection().getConnection('linux')
const saString = new GetConnection().getConnection('sa')

console.log(`connectionString = ${connectionString}`)

const table = '_customer2'

async function create () {
  const theConnection = await mssql.promises.open(connectionString)

  const mgr = theConnection.tableMgr()
  const res = await theConnection.promises.query('SELECT db_NAME() as [db]')
  const db = res.first[0].db || 'node'
  console.log(`db = ${db}`)
  const builder = mgr.makeBuilder(table, db)
  builder.addColumn('id').asInt().isIdentity(true, 1, 1).isPrimaryKey(1)
  builder.addColumn('name').asVarChar(256)
  builder.toTable()

  console.log(builder.createTableSql)
  console.log(builder.clusteredSql)
  console.log(builder.nonClusteredSql)

  await builder.drop()
  await builder.create()
  await builder.index()

  await theConnection.promises.close()
}

async function dbLocks (id, connection) {
  const sql = `SELECT resource_type, 
  resource_database_id, 
  request_mode, request_status FROM sys.dm_tran_locks
  WHERE resource_database_id = DB_ID()
  AND resource_associated_entity_id = OBJECT_ID(N'${table}')`
  const res = await connection.promises.query(sql)
  const modes = res.first ? res.first.map(x => `${x.request_mode}.${x.resource_type}`) : []
  console.log(`[${id}] modes = ${modes.join(', ')}`)
  return res.first ? res.first?.length || 0 : 0
}

async function locks (id, saConnection) {
  const lockCount = await dbLocks(id, saConnection)
  console.log(`[${id}] lock count ${lockCount}`)
  return lockCount
}

// with read uncomitted clause will run but dangerous dirty reads
// const isolation = 'SNAPSHOT'
// default this induces  Bookmark lookup deadlock

const isolation = 'READ COMMITTED '

async function run (id, saConnection) {
  const conn = await mssql.promises.open(connectionString)
  await conn.promises.query(`SET TRANSACTION ISOLATION LEVEL ${isolation}`)
  iterate(id, conn, saConnection)
}

let counter = 0
async function iterate (id, conn, saConnection) {
  console.log(`start [${id}] counter = ${counter}`)
  try {
    console.log(`[${id}] step 1`)
    const modes = await locks(id, saConnection)
    console.log(`[${id}] step 2 mode count ${modes}`)
    const promises = conn.promises
    console.log(`[${id}] step 3`)
    await promises.query('BEGIN TRANSACTION')
    console.log(`[${id}] step 4`)
    await promises.query(`INSERT INTO ${table} (name) VALUES (?)`, ['foo'])
    console.log(`[${id}] step 5`)
    await locks(id, saConnection)
    console.log(`[${id}] step 6`)
    await promises.query(`select top 10 id, name from ${table} order by id DESC`)
    console.log(`[${id}] step 7`)
    await promises.query('COMMIT')
    console.log(`[${id}] step 8`)
    // get a row count
    const total = await promises.query(`select count(*) as rc from ${table}`)
    console.log(`[${id}] row count = ${total.first[0].rc}`)
    await locks(id, saConnection)
  } catch (e) {
    console.log(`[${id}] error = ${e}`)
  }
  console.log(`done [${id}] counter = ${counter}`)

  setTimeout(() => {
    ++counter
    iterate(id, conn, saConnection)
  }, 100)
}

async function runner () {
  await create()
  const saConnection = await mssql.promises.open(saString)
  for (let i = 0; i < 3; i++) {
    run(i, saConnection)
  }
}

runner().then(() => {
  console.log('done')
})

and 3 connections I do quicly see deadlock

connectionString = Driver={ODBC Driver 17 for SQL Server}; Server=192.168.1.102;UID=linux; PWD=linux; Database=node
db = node
CREATE TABLE node.dbo._customer2 ([id] int IDENTITY(1,1), [name] varchar (256) )
CREATE CLUSTERED INDEX IX_node_dbo__customer2_c ON node.dbo._customer2(id)
CREATE NONCLUSTERED INDEX IX_node_dbo__customer2_nc ON node.dbo._customer2([id], [name])
done
start [0] counter = 0
[0] step 1
start [1] counter = 0
[1] step 1
start [2] counter = 0
[2] step 1
[0] modes =
[0] lock count 0
[0] step 2 mode count 0
[0] step 3
[1] modes =
[1] lock count 0
[1] step 2 mode count 0
[1] step 3
[2] modes =
[2] lock count 0
[2] step 2 mode count 0
[2] step 3
[0] step 4
[1] step 4
[2] step 4
[0] step 5
[1] step 5
[2] step 5
[0] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[0] lock count 3
[0] step 6
[1] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[1] lock count 3
[1] step 6
[2] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[2] lock count 3
[2] step 6
[2] error = Error: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
done [2] counter = 0
start [2] counter = 1
[2] step 1
[2] modes = IX.OBJECT, IX.OBJECT
[2] lock count 2
[2] step 2 mode count 2
[2] step 3
[2] step 4
[2] step 5
[2] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[2] lock count 3
[2] step 6

and enabling server monitoring I see the graph

image

If I run with isolation level snapshot then we run perfectly over a long time period. This imho is the cleanest solution but requires enabling on table and if you are working with vendors where DDL changes can not be made to support snapshot isolation then running with concurrency like this is going to lead to locks where we have the select and insert on open transaction- I have no idea other than this.

I can run 10 connections with SNAPSHOT isolation and all works fine.

If I run 10 connections with default isolation I do not reliably see the deadlock raised but this is what is happening and server is reporting it in event viewer. There is very little chance this will be sorted out and is indeed likely due to internal resource exhaustion preventing event making it back to the javascript.

lroal commented 2 years ago

I appreciate your hard effort. Thank's a lot. I assume you get a deadlock because of the non clustered index. So this is a slightly different case than the one above. In my reproducable case there were no deadlocks and no clustered indexes. It was node / node-sqlserver-v8 itself that was hanging.

I am have not deep knowledge of ms sql, but here is a thread about non clustered indexes creating deadlocks. https://dba.stackexchange.com/questions/279099/facing-deadlock-after-introducing-a-new-non-clustered-index-on-heap

Maybe I can create a workaround by running the connections in separate worker threads. That will perhaps fix the symptoms, but not the real cause of the problem.