There is a stored procedure which is executed every 5 seconds. The stored procedure returns the result of the join of two tables. Data is stored in one database on a single server. It is NOT a distributed query. It is a read only query.
Once in a blue moon(once a day) something bad happen, and my program break. After that on each request I see the recurrent error:
"error": "mssql: New request is not allowed to start because it should come with valid transaction descriptor."
The thing is, that I get this error only on Commit, which mean that the connection is established, I even can scan the results, but on commit I receive this error. So it looks like the bad connection is stuck in the pool, and the driver does not recognize that the connection is broken.
Digging into the problem
I added sql hooks, increased the logging level, and filled my code with bunch of log statements. Here is what I found:
1) after the transaction is open and before it is closed driver prints the error (note: ip addresses is removed)
"The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n""The transaction active in this session has been committed or aborted by another session."
"ROLLBACK TRANSACTION 5100000d66\n"
4) then after 5 seconds the procedure is called again, and now it fails on commit
"msg": "detected mssql error",
"message": "The server failed to resume the transaction. Desc:3a0000005d.",
"number": 3971,
"state": 1,
"class": 16,
"server_name": "MY_SERVER",
"line_no": 1,
5) Then I see the recurrent error
"New request is not allowed to start because it should come with valid transaction descriptor."v
Conclusions
I reviewed the code of the driver and I think that there is no proper error handling in the row scan logic. Errors are mostly ignored.
Also there is no way to tell the driver which errors are unrecoverable. The only way to do this is to patch the driver.
I was able to provide the patch for the error (5), but ideally it would be great to find the source of error (1) and remove the broken connection from the pool.
Describe the bug
There is a stored procedure which is executed every 5 seconds. The stored procedure returns the result of the join of two tables. Data is stored in one database on a single server. It is NOT a distributed query. It is a read only query.
Once in a blue moon(once a day) something bad happen, and my program break. After that on each request I see the recurrent error:
The thing is, that I get this error only on Commit, which mean that the connection is established, I even can scan the results, but on commit I receive this error. So it looks like the bad connection is stuck in the pool, and the driver does not recognize that the connection is broken.
Digging into the problem
I added sql hooks, increased the logging level, and filled my code with bunch of log statements. Here is what I found:
1) after the transaction is open and before it is closed driver prints the error (note: ip addresses is removed)
2) after that the row scan returns the error
3) the transaction is cancelled
4) then after 5 seconds the procedure is called again, and now it fails on commit
5) Then I see the recurrent error
Conclusions
I reviewed the code of the driver and I think that there is no proper error handling in the row scan logic. Errors are mostly ignored. Also there is no way to tell the driver which errors are unrecoverable. The only way to do this is to patch the driver.
I was able to provide the patch for the error (5), but ideally it would be great to find the source of error (1) and remove the broken connection from the pool.
Further technical details
Microsoft SQL Server 2012 - 11.0.2100.60 (X64)