FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.19k stars 205 forks source link

FB3 CS rejects new connections #7661

Closed gsbelarus closed 8 months ago

gsbelarus commented 11 months ago

The situation looks peculiar, as the database and the software worked for the last 20 years without significant problems. The size of the database file is around 350GB. There are over 350 active connections at peak times during the workday. FB3 CS is used.

The problem started nearly a month ago, and we link it to the server upgrade to the latest nightly build 3.0.11.

At some point, the server just stops accepting new connections while older connections (processes) continue to work well. New connections just freeze.

If you examine the list of processes and begin to terminate them one by one, the server becomes responsive again after a certain process is terminated.

The application in question wasn't updated for the last ten years and worked perfectly until now.

The further investigation suggested that there is a correlation between the server becoming unresponsive and one particular window being opened in the application. This window does two things at its creation:

  1. executes SELECT MON$DATABASE_NAME FROM MON$DATABASE statement.
  2. calls isc_service_attach function.

Not every instance of opening the window causes a connection to become unresponsive, but the more active connections there are on the server, the higher the likelihood that the server will become unresponsive.

omachtandras commented 11 months ago

Maybe the same problem: https://github.com/FirebirdSQL/firebird/issues/7480

AlexPeshkoff commented 11 months ago

What OS?

AlexPeshkoff commented 11 months ago

To begin with please provide fb_lock_print output when connections are rejected.

MIchaelShoihet commented 11 months ago

Windows Server 2016 Standart

MIchaelShoihet commented 11 months ago

file.txt Подключение шло к тестовой базе, с которой никто не работал. Но сервер перестает для любой базы.

aafemt commented 11 months ago
  • executes SELECT MON$DATABASE_NAME FROM MON$DATABASE statement.
  • calls isc_service_attach function.

So it is either monitoring snapshot collection or authentication. Can you remove one of these calls from the form to find out which? I guess your application already has database name in attachment properties so it doesn't need to read it from monitoring tables.

AlexPeshkoff commented 11 months ago

Here is English speaking place - please do not use Russian.

If all databases are affected by this problem, that's definitely not related with lock manager - it works on per database basis.

aafemt commented 11 months ago

Then monitoring snapshot can be ruled out and only security DB is left suspicious. Lock print from there perhaps would be useful.

gsbelarus commented 11 months ago

Should we provide any additional info?

gsbelarus commented 11 months ago

firebird.conf is set for using Legacy_Auth and Legacy_UserManager.

hvlad commented 11 months ago

Should we provide any additional info?

Full memory dump of freeze Firebird process, please. Of course, with used binaries and .pdb files.

gsbelarus commented 11 months ago

as this is CS there are tens of FB processes...

hvlad commented 11 months ago

It is not that hard to detect new server process when new client is attaching.

gsbelarus commented 11 months ago

If it helps, connections are made through fbembed.dll from FB2.5.

hvlad commented 11 months ago

Memory dump could help.

MIchaelShoihet commented 10 months ago

after run gbak -r d:\Data_base\gdbase2023_6_20_22_0.bk localhost:e:\Data_base\gdbase2023_6_20.fdb -user SYSDBA -password xxxxxxxx firebird (32).zip It's 7z

hvlad commented 10 months ago

Please, read carefully:

Full memory dump of freeze Firebird process, please. > Of course, with used binaries and .pdb files.

MIchaelShoihet commented 10 months ago

firebird.zip

hvlad commented 10 months ago

Please, provide full set of .pdb files. No need to guess what exactly file required, just provide whole archive of Firebird build used.

MIchaelShoihet commented 10 months ago

firebird_exe.zip

hvlad commented 10 months ago

This is not what I asked for

gsbelarus commented 10 months ago

would whole firebird server directory be enough?

hvlad commented 10 months ago

If it contains corresponding .pdb files - yes. Absolutely required is engine12.pdb.

gsbelarus commented 10 months ago

If it contains corresponding .pdb files - yes. Absolutely required is engine12.pdb.

There is no engine12.pdb file in the nightly built archive:

https://web.firebirdsql.org/download/snapshot_builds/win/3.0/

hvlad commented 10 months ago

Look for it at the plugins folder, near engine12.dll.

MIchaelShoihet commented 10 months ago

FB3.zip It's 7z

gsbelarus commented 10 months ago

@hvlad

So, to finalize things.

  1. If gbak utility is called with localhost: specified, then the process freezes. Corresponding dump is attached above.
  2. If gbak utility is called with just full path to the database file, then the following error appears:
gbak: ERROR:Shared memory area is probably already created by another engine instance in another Windows session
gbak: ERROR:failed to create database e:\Data_base\gdbase2023_6_20.fdb
gbak:Exiting before completion due to errors
  1. by finding and killing the process which is frozen, the server returns into the workable responsive state.
hvlad commented 10 months ago

Do you run few instances of Firebird\HQBird servers at the same time ? What versions ?

gsbelarus commented 10 months ago

Do you run few instances of Firebird\HQBird servers at the same time ? What versions ?

Just one. On the default port 3050.

hvlad commented 10 months ago

Is there embedded connections ? Could you avoid them ?

gsbelarus commented 10 months ago

At least what I know, there are none. Is it possible to detect connection through fbembed.dll in the MON$ATTACHMENTS table?

gsbelarus commented 10 months ago

No. There are no embedded connections. We just checked MON$ATTACHMENTS table. All connections are made with single EXE located in the shared network folder. This EXE connects to the server through fbclient.dll v3 located nearby, using host:database_path connection string.

Now, having dozens of active connections we run gbak -b localhost:... and get frozen process again.

EPluribusUnum commented 10 months ago

@gsbelarus , you can't block embedded connections usage. We had similar issues : https://github.com/FirebirdSQL/firebird/issues/7443

gsbelarus commented 10 months ago

Now we observe the following situation: the server works and new connections are accepted. We run the gbak on the same machine using the command gbak -b localhost:... (i.e. using TCP connection) and the server freezes and stops accepting newer connections. After killing the process, the server unfreezes and starts working as expected.

It looks like a major flaw for me. Now one unexpected connection could drop a server the whole enterprise with hundreds of connections depends on.

gsbelarus commented 10 months ago

Oh, an important piece of information is missed. Bad for me. The server freezes and stops accepting new connections to the one database when we try to back up or restore another database on the server. In all cases, connections are made through TCP either using localhost:... connection string or host_name:... connection strings for network client connection.

All network clients run the application from a common shared folder which resides on the same server as the database. Every client has full access to that folder. fbclient.dll resides near the application's EXE module.

When we observe server freezing, new processes of Firebird are being created for new connections, but they just do nothing. Until we find and kill a certain process, then all that processes start working at once.

aafemt commented 10 months ago

you can't block embedded connections usage.

In addition to suggestions in that topic you can revoke file system permissions on databases file from anyone but Firebird service user. In this case Embedded engine run by interactive user won't be able to access the file and fail.

gsbelarus commented 10 months ago

you can't block embedded connections usage.

In addition to suggestions in that topic you can revoke file system permissions on databases file from anyone but Firebird service user. In this case Embedded engine run by interactive user won't be able to access the file and fail.

I have put a clarification in the previous post. The server freezes while serving one database if we start gbak -r for another database.

aafemt commented 10 months ago

You have been already asked for fb_lock_print result from security.db and full memory dump of corresponding firebird process. The process can be found using netstat -o. According to Vlad's comment he found something and would like you to check absence of embedded usage.

hvlad commented 10 months ago

When we observe server freezing, new processes of Firebird are being created for new connections, but they just do nothing. Until we find and kill a certain process, then all that processes start working at once.

Is it possible to take memory dump of this particular process ? I.e. before killing frozen process - take memory dump, if that process unblock others - this is the dump we need, else you may drop the dump.

Also, is there some unusual records in firebird.log ?

The server freezes while serving one database if we start gbak -r for another database.

I can confirm that "dangerous" operation happens when database is created, but not only in this case. As a temporary workaround I can suggest to restore on different machine.

Embedded is not suspicious, it was false assumption.

gsbelarus commented 10 months ago

after run gbak -r d:\Data_base\gdbase2023_6_20_22_0.bk localhost:e:\Data_base\gdbase2023_6_20.fdb -user SYSDBA -password xxxxxxxx firebird (32).zip It's 7z

this is a dump from the frozen process

gsbelarus commented 10 months ago

logs don't contain anything of interest. neither firebird.log, nor logs of the OS.

hvlad commented 10 months ago

@gsbelarus, seems I have an idea of special build that will collect more info about the issue. Are you ready to run such special build ?

gsbelarus commented 10 months ago

sure. send me the file.

hvlad commented 10 months ago

Firebird-3.0.11.33695_Win32

This build will produce error instead of waiting and put detailed record into firebird.log.

error: Timeout when waiting callback from other process. See firebird.log for details.

firebird.log: clearMapping: pid 51004 get no callback from pid 39908 (...\firebird.exe). event count 1, expected 1

In this case you should read message in firebird.log and, if stalled process (39908 in the case above) is firebird.exe - take memory dump of it and than kill it to unblock further actions.

gsbelarus commented 10 months ago

could you compile 64bit build?

hvlad commented 10 months ago

Yes, I can. But... do you run at the same time both 32-bit and 64-bit Firebird processes ? If yes, this is old CORE-5515

gsbelarus commented 10 months ago

No. There is exactly one folder with the FB3 64 server. That is why I want 64bit build.

hvlad commented 10 months ago

OMG, why the dump name points to the 32-bitness ??? https://github.com/FirebirdSQL/firebird/issues/7661#issuecomment-1628898522

x64 build Firebird-3.0.11.33695_x64

gsbelarus commented 10 months ago

just coincidence. 32nd FB process that was killed in order to unfreeze the system

gsbelarus commented 10 months ago

@hvlad this special build slows down the restore process significantly. usually it takes 12–13 hours to restore a 300 Gb database. it has been running for 35 hours now and has done just 80 Gb. at least, there are no freezes and errors.