Closed gsbelarus closed 8 months ago
Maybe the same problem: https://github.com/FirebirdSQL/firebird/issues/7480
What OS?
To begin with please provide fb_lock_print output when connections are rejected.
Windows Server 2016 Standart
file.txt Подключение шло к тестовой базе, с которой никто не работал. Но сервер перестает для любой базы.
- executes
SELECT MON$DATABASE_NAME FROM MON$DATABASE
statement.- calls
isc_service_attach
function.
So it is either monitoring snapshot collection or authentication. Can you remove one of these calls from the form to find out which? I guess your application already has database name in attachment properties so it doesn't need to read it from monitoring tables.
Here is English speaking place - please do not use Russian.
If all databases are affected by this problem, that's definitely not related with lock manager - it works on per database basis.
Then monitoring snapshot can be ruled out and only security DB is left suspicious. Lock print from there perhaps would be useful.
Should we provide any additional info?
firebird.conf is set for using Legacy_Auth
and Legacy_UserManager
.
Should we provide any additional info?
Full memory dump of freeze Firebird process, please. Of course, with used binaries and .pdb files.
as this is CS there are tens of FB processes...
It is not that hard to detect new server process when new client is attaching.
If it helps, connections are made through fbembed.dll from FB2.5.
Memory dump could help.
after run gbak -r d:\Data_base\gdbase2023_6_20_22_0.bk localhost:e:\Data_base\gdbase2023_6_20.fdb -user SYSDBA -password xxxxxxxx firebird (32).zip It's 7z
Please, read carefully:
Full memory dump of freeze Firebird process, please. > Of course, with used binaries and .pdb files.
Please, provide full set of .pdb files. No need to guess what exactly file required, just provide whole archive of Firebird build used.
This is not what I asked for
would whole firebird server directory be enough?
If it contains corresponding .pdb files - yes. Absolutely required is engine12.pdb.
If it contains corresponding .pdb files - yes. Absolutely required is engine12.pdb.
There is no engine12.pdb file in the nightly built archive:
https://web.firebirdsql.org/download/snapshot_builds/win/3.0/
Look for it at the plugins
folder, near engine12.dll.
FB3.zip It's 7z
@hvlad
So, to finalize things.
gbak: ERROR:Shared memory area is probably already created by another engine instance in another Windows session
gbak: ERROR:failed to create database e:\Data_base\gdbase2023_6_20.fdb
gbak:Exiting before completion due to errors
Do you run few instances of Firebird\HQBird servers at the same time ? What versions ?
Do you run few instances of Firebird\HQBird servers at the same time ? What versions ?
Just one. On the default port 3050.
Is there embedded connections ? Could you avoid them ?
At least what I know, there are none. Is it possible to detect connection through fbembed.dll in the MON$ATTACHMENTS table?
No. There are no embedded connections. We just checked MON$ATTACHMENTS table. All connections are made with single EXE located in the shared network folder. This EXE connects to the server through fbclient.dll v3 located nearby, using host:database_path connection string.
Now, having dozens of active connections we run gbak -b localhost:... and get frozen process again.
@gsbelarus , you can't block embedded connections usage. We had similar issues : https://github.com/FirebirdSQL/firebird/issues/7443
Now we observe the following situation: the server works and new connections are accepted. We run the gbak on the same machine using the command gbak -b localhost:...
(i.e. using TCP connection) and the server freezes and stops accepting newer connections. After killing the process, the server unfreezes and starts working as expected.
It looks like a major flaw for me. Now one unexpected connection could drop a server the whole enterprise with hundreds of connections depends on.
Oh, an important piece of information is missed. Bad for me. The server freezes and stops accepting new connections to the one database when we try to back up or restore another database on the server. In all cases, connections are made through TCP either using localhost:... connection string or host_name:... connection strings for network client connection.
All network clients run the application from a common shared folder which resides on the same server as the database. Every client has full access to that folder. fbclient.dll resides near the application's EXE module.
When we observe server freezing, new processes of Firebird are being created for new connections, but they just do nothing. Until we find and kill a certain process, then all that processes start working at once.
you can't block embedded connections usage.
In addition to suggestions in that topic you can revoke file system permissions on databases file from anyone but Firebird service user. In this case Embedded engine run by interactive user won't be able to access the file and fail.
you can't block embedded connections usage.
In addition to suggestions in that topic you can revoke file system permissions on databases file from anyone but Firebird service user. In this case Embedded engine run by interactive user won't be able to access the file and fail.
I have put a clarification in the previous post. The server freezes while serving one database if we start gbak -r for another database.
You have been already asked for fb_lock_print result from security.db and full memory dump of corresponding firebird process. The process can be found using netstat -o
. According to Vlad's comment he found something and would like you to check absence of embedded usage.
When we observe server freezing, new processes of Firebird are being created for new connections, but they just do nothing. Until we find and kill a certain process, then all that processes start working at once.
Is it possible to take memory dump of this particular process ? I.e. before killing frozen process - take memory dump, if that process unblock others - this is the dump we need, else you may drop the dump.
Also, is there some unusual records in firebird.log ?
The server freezes while serving one database if we start gbak -r for another database.
I can confirm that "dangerous" operation happens when database is created, but not only in this case. As a temporary workaround I can suggest to restore on different machine.
Embedded is not suspicious, it was false assumption.
after run gbak -r d:\Data_base\gdbase2023_6_20_22_0.bk localhost:e:\Data_base\gdbase2023_6_20.fdb -user SYSDBA -password xxxxxxxx firebird (32).zip It's 7z
this is a dump from the frozen process
logs don't contain anything of interest. neither firebird.log, nor logs of the OS.
@gsbelarus, seems I have an idea of special build that will collect more info about the issue. Are you ready to run such special build ?
sure. send me the file.
This build will produce error instead of waiting and put detailed record into firebird.log.
error: Timeout when waiting callback from other process. See firebird.log for details.
firebird.log: clearMapping: pid 51004 get no callback from pid 39908 (...\firebird.exe). event count 1, expected 1
In this case you should read message in firebird.log and, if stalled process (39908 in the case above) is firebird.exe - take memory dump of it and than kill it to unblock further actions.
could you compile 64bit build?
Yes, I can. But... do you run at the same time both 32-bit and 64-bit Firebird processes ? If yes, this is old CORE-5515
No. There is exactly one folder with the FB3 64 server. That is why I want 64bit build.
OMG, why the dump name points to the 32-bitness ??? https://github.com/FirebirdSQL/firebird/issues/7661#issuecomment-1628898522
x64 build Firebird-3.0.11.33695_x64
just coincidence. 32nd FB process that was killed in order to unfreeze the system
@hvlad this special build slows down the restore process significantly. usually it takes 12–13 hours to restore a 300 Gb database. it has been running for 35 hours now and has done just 80 Gb. at least, there are no freezes and errors.
The situation looks peculiar, as the database and the software worked for the last 20 years without significant problems. The size of the database file is around 350GB. There are over 350 active connections at peak times during the workday. FB3 CS is used.
The problem started nearly a month ago, and we link it to the server upgrade to the latest nightly build 3.0.11.
At some point, the server just stops accepting new connections while older connections (processes) continue to work well. New connections just freeze.
If you examine the list of processes and begin to terminate them one by one, the server becomes responsive again after a certain process is terminated.
The application in question wasn't updated for the last ten years and worked perfectly until now.
The further investigation suggested that there is a correlation between the server becoming unresponsive and one particular window being opened in the application. This window does two things at its creation:
SELECT MON$DATABASE_NAME FROM MON$DATABASE
statement.isc_service_attach
function.Not every instance of opening the window causes a connection to become unresponsive, but the more active connections there are on the server, the higher the likelihood that the server will become unresponsive.