Closed lroal closed 11 months ago
It could be related to winston logger blocking the process. I have deactivated the winston logger just to eliminate. Waiting for what is gonna happen tonight. https://github.com/winstonjs/winston/issues?q=is%3Aissue+hang+is%3Aopen https://github.com/winstonjs/winston/issues/1364
It was still blocking tonight with winston logger deactivated.
I tried replacing generic-pool with https://github.com/vincit/tarn.js/ , but blocking still occurs. It could be related to my own code, but I just can't see how...
The sleeping/blocking connection is in the middle of a transaction and is supposed to receive a commit. But it never happens. It seems to be some kind of (dead)lock in node. After 20 seconds this code hangs. It uses a pool of size 10:
import { v4 } from 'uuid';
import rdb from 'rdb';
rdb.log(console.log)
let db = rdb.mssql('server=localhost;Database=rems2;Trusted_Connection=Yes;Driver={ODBC Driver 17 for SQL Server}', {size: 10});
async function doQuery() {
return db.transaction(async () => {
let query = {
sql: "INSERT INTO chargingpoint_status (chargingpoint_status_guid,chargingpoint_guid,actor_cp_id,clock_offset_ms,ups_active,cabinet_closed,temperaturealarm,statusfilename,tariffilename,aitfilename,system_restarts,percentage_image_space_used,percentage_tr_space_used,last_tr_file_transfer,last_pic_file_transfer,pic_tr_missing_picture,createtime) OUTPUT INSERTED.chargingpoint_status_guid,INSERTED.chargingpoint_guid,INSERTED.actor_cp_id,INSERTED.clock_offset_ms,INSERTED.ups_active,INSERTED.cabinet_closed,INSERTED.temperaturealarm,INSERTED.statusfilename,INSERTED.tariffilename,INSERTED.aitfilename,INSERTED.system_restarts,INSERTED.percentage_image_space_used,INSERTED.percentage_tr_space_used,INSERTED.last_tr_file_transfer,INSERTED.last_pic_file_transfer,INSERTED.pic_tr_missing_picture,INSERTED.createtime VALUES (?, null,?,8,0,1,0,?,?,?,0,1,0,'2022-04-05T09:59:27.973','2022-04-05T10:11:48.07',0,'2022-04-05T10:20:56.140')",
parameters: [v4(), 100140203, 'obustatusfile_100140_20220405_77.dat', 'A_tariffile_100007_20190625_01.dat', 'AIT2000002022010401_000000_130001']
};
let query2 = { sql: 'select top 10 * from chargingpoint_status order by chargingpoint_status_guid DESC' };
await rdb.query(query);
await rdb.query(query2);
});
}
async function iterate() {
for (let i = 0; i < 50000; i++) {
doQuery();
}
}
iterate();
This simple example will reproduce the hang after seconds. Run sp_who2 and you will see a sleeping connection blocking for the others.
Table:
CREATE TABLE _customer (id UNIQUEIDENTIFIER PRIMARY KEY default NEWID(), name VARCHAR(256))
stress.mjs:
import mssql from 'msnodesqlv8';
const connStr = 'server=localhost;Database=rdbDemo;Trusted_Connection=Yes;Driver={ODBC Driver 17 for SQL Server}';
function run() {
mssql.open(connStr, function (err, conn) {
if (err)
throw err;
else {
iterate(conn);
}
});
}
async function iterate(conn) {
console.log('start')
await runQuery(conn, 'BEGIN TRANSACTION');
await runQuery(conn, 'INSERT INTO _customer (name) OUTPUT INSERTED.id,INSERTED.name VALUES (?)', ['foo']);
await runQuery(conn, 'select top 10 * from _customer order by id DESC');
await runQuery(conn, 'COMMIT');
console.log('done')
setTimeout(() => iterate(conn), 100);
}
async function runQuery(conn, query, params = []) {
return new Promise((resolve, reject) => {
conn.query(query, params, (err, res) => {
if (err)
reject(err);
else
resolve(res);
});
});
}
for (let i = 0; i < 10; i++) {
run();
}
thanks for above example. The first step will be to see if I observe the same behavior.
These kinds of problems can be extremely hard to find- the cpp is running on worker threads which marshal back onto the main Node JS thread.
I can make no promises on progress but will hope to report back having run this example -what happens for example if only 1 connection is open and we iterate on that.
anyway i will make a start shortly
If you run the sample with only one connection, everyting works fine. (And if you run multiple node programs with one connection each - everything works fine.)
Thank's for your effort so far. I am already using your lib at my employer - but currently I can only use one connection in the pool due to this problem. I think your library is great, actually superior to others, and this is the only obstacle left (for me).
interesting first observation
using select outside the transaction then I do not see a hang up - at least up to 7000 iterations over 10 connections It will hang if select is inside transaction.
start [1] counter = 7054
done [2] counter = 7054
done [0] counter = 7054
done [6] counter = 7054
done [5] counter = 7054
done [7] counter = 7054
done [4] counter = 7054
done [3] counter = 7054
done [8] counter = 7054
done [1] counter = 7054
const mssql = require('msnodesqlv8')
const { GetConnection } = require('./get-connection')
const connectionString = new GetConnection().getConnection('linux')
console.log(`connectionString = ${connectionString}`)
async function run (id) {
const conn = await mssql.promises.open(connectionString)
iterate(id, conn)
}
let counter = 0
async function iterate (id, conn) {
console.log(`start [${id}] counter = ${counter}`)
try {
await conn.promises.query('BEGIN TRANSACTION')
await conn.promises.query('INSERT INTO _customer (name) OUTPUT INSERTED.id,INSERTED.name VALUES (?)', ['foo'])
await conn.promises.query('COMMIT')
await conn.promises.query('select top 10 * from _customer order by id DESC')
} catch (e) {
console.log(`[${id}] error = ${e}`)
}
console.log(`done [${id}] counter = ${counter}`)
setTimeout(() => {
++counter
iterate(id, conn)
}, 100)
}
for (let i = 0; i < 10; i++) {
run(i)
}
Yes, that is the problem. It will hang because the select query is waiting for the INSERT to be commited. But the code is not able to commit. Maybe the code expects some kind of special sequence of the results from the database ? A kind of dead lock on the client side ? The database itself is not in a dead lock situation. Just run sp_who2 and you will see.
using test code as follows
const mssql = require('msnodesqlv8')
const { GetConnection } = require('./get-connection')
const connectionString = new GetConnection().getConnection('linux')
const saString = new GetConnection().getConnection('sa')
console.log(`connectionString = ${connectionString}`)
const table = '_customer2'
async function create () {
const theConnection = await mssql.promises.open(connectionString)
const mgr = theConnection.tableMgr()
const res = await theConnection.promises.query('SELECT db_NAME() as [db]')
const db = res.first[0].db || 'node'
console.log(`db = ${db}`)
const builder = mgr.makeBuilder(table, db)
builder.addColumn('id').asInt().isIdentity(true, 1, 1).isPrimaryKey(1)
builder.addColumn('name').asVarChar(256)
builder.toTable()
console.log(builder.createTableSql)
console.log(builder.clusteredSql)
console.log(builder.nonClusteredSql)
await builder.drop()
await builder.create()
await builder.index()
await theConnection.promises.close()
}
async function dbLocks (id, connection) {
const sql = `SELECT resource_type,
resource_database_id,
request_mode, request_status FROM sys.dm_tran_locks
WHERE resource_database_id = DB_ID()
AND resource_associated_entity_id = OBJECT_ID(N'${table}')`
const res = await connection.promises.query(sql)
const modes = res.first ? res.first.map(x => `${x.request_mode}.${x.resource_type}`) : []
console.log(`[${id}] modes = ${modes.join(', ')}`)
return res.first ? res.first?.length || 0 : 0
}
async function locks (id, saConnection) {
const lockCount = await dbLocks(id, saConnection)
console.log(`[${id}] lock count ${lockCount}`)
return lockCount
}
// with read uncomitted clause will run but dangerous dirty reads
// const isolation = 'SNAPSHOT'
// default this induces Bookmark lookup deadlock
const isolation = 'READ COMMITTED '
async function run (id, saConnection) {
const conn = await mssql.promises.open(connectionString)
await conn.promises.query(`SET TRANSACTION ISOLATION LEVEL ${isolation}`)
iterate(id, conn, saConnection)
}
let counter = 0
async function iterate (id, conn, saConnection) {
console.log(`start [${id}] counter = ${counter}`)
try {
console.log(`[${id}] step 1`)
const modes = await locks(id, saConnection)
console.log(`[${id}] step 2 mode count ${modes}`)
const promises = conn.promises
console.log(`[${id}] step 3`)
await promises.query('BEGIN TRANSACTION')
console.log(`[${id}] step 4`)
await promises.query(`INSERT INTO ${table} (name) VALUES (?)`, ['foo'])
console.log(`[${id}] step 5`)
await locks(id, saConnection)
console.log(`[${id}] step 6`)
await promises.query(`select top 10 id, name from ${table} order by id DESC`)
console.log(`[${id}] step 7`)
await promises.query('COMMIT')
console.log(`[${id}] step 8`)
// get a row count
const total = await promises.query(`select count(*) as rc from ${table}`)
console.log(`[${id}] row count = ${total.first[0].rc}`)
await locks(id, saConnection)
} catch (e) {
console.log(`[${id}] error = ${e}`)
}
console.log(`done [${id}] counter = ${counter}`)
setTimeout(() => {
++counter
iterate(id, conn, saConnection)
}, 100)
}
async function runner () {
await create()
const saConnection = await mssql.promises.open(saString)
for (let i = 0; i < 3; i++) {
run(i, saConnection)
}
}
runner().then(() => {
console.log('done')
})
and 3 connections I do quicly see deadlock
connectionString = Driver={ODBC Driver 17 for SQL Server}; Server=192.168.1.102;UID=linux; PWD=linux; Database=node
db = node
CREATE TABLE node.dbo._customer2 ([id] int IDENTITY(1,1), [name] varchar (256) )
CREATE CLUSTERED INDEX IX_node_dbo__customer2_c ON node.dbo._customer2(id)
CREATE NONCLUSTERED INDEX IX_node_dbo__customer2_nc ON node.dbo._customer2([id], [name])
done
start [0] counter = 0
[0] step 1
start [1] counter = 0
[1] step 1
start [2] counter = 0
[2] step 1
[0] modes =
[0] lock count 0
[0] step 2 mode count 0
[0] step 3
[1] modes =
[1] lock count 0
[1] step 2 mode count 0
[1] step 3
[2] modes =
[2] lock count 0
[2] step 2 mode count 0
[2] step 3
[0] step 4
[1] step 4
[2] step 4
[0] step 5
[1] step 5
[2] step 5
[0] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[0] lock count 3
[0] step 6
[1] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[1] lock count 3
[1] step 6
[2] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[2] lock count 3
[2] step 6
[2] error = Error: [Microsoft][ODBC Driver 17 for SQL Server][SQL Server]Transaction (Process ID 60) was deadlocked on lock resources with another process and has been chosen as the deadlock victim. Rerun the transaction.
done [2] counter = 0
start [2] counter = 1
[2] step 1
[2] modes = IX.OBJECT, IX.OBJECT
[2] lock count 2
[2] step 2 mode count 2
[2] step 3
[2] step 4
[2] step 5
[2] modes = IX.OBJECT, IX.OBJECT, IX.OBJECT
[2] lock count 3
[2] step 6
and enabling server monitoring I see the graph
If I run with isolation level snapshot then we run perfectly over a long time period. This imho is the cleanest solution but requires enabling on table and if you are working with vendors where DDL changes can not be made to support snapshot isolation then running with concurrency like this is going to lead to locks where we have the select and insert on open transaction- I have no idea other than this.
I can run 10 connections with SNAPSHOT isolation and all works fine.
If I run 10 connections with default isolation I do not reliably see the deadlock raised but this is what is happening and server is reporting it in event viewer. There is very little chance this will be sorted out and is indeed likely due to internal resource exhaustion preventing event making it back to the javascript.
I appreciate your hard effort. Thank's a lot. I assume you get a deadlock because of the non clustered index. So this is a slightly different case than the one above. In my reproducable case there were no deadlocks and no clustered indexes. It was node / node-sqlserver-v8 itself that was hanging.
I am have not deep knowledge of ms sql, but here is a thread about non clustered indexes creating deadlocks. https://dba.stackexchange.com/questions/279099/facing-deadlock-after-introducing-a-new-non-clustered-index-on-heap
Maybe I can create a workaround by running the connections in separate worker threads. That will perhaps fix the symptoms, but not the real cause of the problem.
We have a node server receiving a lot of requests every 5th minute. On average every 24th hour this ends up in a situation where the entire event loop is blocked. The node process is still running but unable to do anything. (It is not even capable to serve plain requests that has nothing to do with the database.) The cpu is zero and memory usage low. When checking with sp_who2 I can see that a sleeping Spid is blocking for others. If i kill it, everything turns out normal again. Se attached picture below. I am using transactions with commit / rollback. I am also using a variant of generic-pool with a size of 10. The clients are not immedietely closed, but kept open by generic-pool for a while.
This is very strange. I am using the library https://github.com/alfateam/rdb (I am the author) . I have been using it for 10 years against postgres without any problems.
We are using node 16. But this also happens on node 12.