Closed gsbelarus closed 7 months ago
XNET error: XNET server initialization failed. Probably, another instance of the server is already running.
To simultaneously run several servers using XNET, you need to configure the IpcName (firebird.conf) in the same way as you configure ports for different FB instances to work via INET.
I used the connection string localhost/3056:path_to_database
. Through all my life, I believed this is network connection. Am I wrong?
Moreover, I checked the firebird.conf
. There is no section dedicated to XNET settings exists. Where and what should I assign?
It's IPCNAME parameter that makes it possible to run multiple servers, listening xnet, in parallel. But it's related only to message in the log (XNET error: XNET server initialization failed. Probably, another instance of the server is already running.). What you mention first looks like server crash at the first look. Does it happen for all versions or only one particular?
It's IPCNAME parameter that makes it possible to run multiple servers, listening xnet, in parallel. But it's related only to message in the log (XNET error: XNET server initialization failed. Probably, another instance of the server is already running.). What you mention first looks like server crash at the first look. Does it happen for all versions or only one particular?
I got another backup and now doing restoration. Will come with the result in a few hours.
As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5?
On 1/2/24 19:57, Andrej Kirejeŭ wrote:
As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5?
Enough to avoid mentioned message in the log.
On 1/2/24 19:57, Andrej Kirejeŭ wrote: As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5? Enough to avoid mentioned message in the log.
as it is a common scenario when multiple versions of Firebird co-exists on the same server, it would be useful to set the default IPCNAME to FIREBIRD5 for Firebird v.5, FIREBIRD6 for the next version etc
sorry, bug confirmed. another backup -- the same result:
Unable to complete network request to host "localhost". Error reading data from the connection.
the message during the restore process and then:
Number of record level errors : 18722
during the gfix check.
There are multiple Firebird instances on the server: FB25, FB3, and FB5. Each instance is assigned a dedicated port and an appropriate service name. When using a third-party application to perform a restore, it connects to the server via a TCP connection using the connection string
localhost/3056:some_path_to_database
.
What server version is supposed to perform restore ? Did that application run restore using Services API ? If, yes, how it is attached to the services manager and why it is used "localhost... " in target database name ? What happens if perform restore with gbak (or fbsvcmgr) using the same set of params ?
What server version is supposed to perform restore ?
FB5 5.0.0.1304
Did that application run restore using Services API ?
yes
If, yes, how it is attached to the services manager
through IBX component:
https://github.com/GoldenSoftwareLtd/gedemin/blob/master/Gedemin/IBX/IBServices.pas
it has been doing this way for last 25 years ))
and why it is used "localhost... " in target database name ?
The Presence of localhost in the address string forced IBX to use TCP protocol flag in its components. We definitely needed it in the times of Yaffil, Firebird 0.9 or even earlier.
BTW, localhost/3056
allows us to call exactly FB5 server. As we have FB3, FB25 also running on this machine
we could distinguish between them only by port number.
What happens if perform restore with gbak (or fbsvcmgr) using the same set of params ?
Need to check.
When run restore with Services API, one should use server name in connection string, not in the target database name.
When speak about IBX, it means that server (remote host) name should be specified in ServerName
property, not in DatabaseName
property.
In your case localhost/3056:service_mgr
is connection string and some_path_to_database
(without server host name!) is target database name. If one use plain 'service_mgr' (empty string at ServerName
property) it means connection uses XNET or, if it was not successful, fbclient will try 'localhost' without port number. As we have no idea what server version was started first and run XNET listener, we could assume it was not FB5 (message in firebird.log confirms it), I.e. another server instance run restore service and connects to the FB5 due to remote part in database name argument (DatabaseName
property). Of course, it doesn't make whole restore process as fast as it should be.
In short: you should check and correct values of ServerName
and DatabaseName
properties.
BTW, what is last message before error ?
This is how our UI organized. we specify the target server and path to the database in one string, then divide it programmatically and set the corresponding IBX component's properties. The connection definitely goes to the FB5 server because only FB5 process shows activity during the restore process.
Before the error message, there are a couple of warnings of unknown UDF functions. These are our functions like BIN_AND and BIN_OR. They duplicate built-in functions. There is no need for them right now, just leftovers in the database we didn't clean up yet. Nevertheless, these warnings were present for years and never resulted into database corruption.
Could you be more specific and show service properties used ? There is some confusion that better to be cleared up. Also, did you investigated if server was crashed (as Alex suppose) ?
IBConfigService.ServerName := edServer.Text;
if IBConfigService.ServerName > '' then
IBConfigService.Protocol := TCP
else
IBConfigService.Protocol := Local;
in our case edServer.text contains localhost/3056
It would be easier to check whether the server crashed or not if appropriate records of server starting and server properly shutting down were put into the firebird.log file.
On 02/01/2024 19:24, Andrej Kirejeŭ wrote:
On 1/2/24 19:57, Andrej Kirejeŭ wrote: As for XNET, is it enough if I set IPCNAME to FIREBIRD5 for Firebird 5? Enough to avoid mentioned message in the log.
as it is a common scenario when multiple versions of Firebird co-exists on the same server, it would be useful to set the default IPCNAME to FIREBIRD5 for Firebird v.5, FIREBIRD6 for the next version etc
No, it wouldn't be, because that would mean an older Firebird fbclient.dll wouldn't be able to connect using XNET to a newer Firebird server (or newer client to older server).
Mark -- Mark Rotteveel
No, it wouldn't be, because that would mean an older Firebird fbclient.dll wouldn't be able to connect using XNET to a newer Firebird server (or newer client to older server). Mark -- Mark Rotteveel
Oh, no. I just changed names to FIREBIRD25, FIREBIRD3, FIREBIRD5... Let us see where it leads now.
But, it won't affect network connections, right?
And regarding failed gfix. The firebird.log now contains thousands of records like:
XXXX Tue Jan 2 13:16:27 2024 Database: XXXX Error: Record 3 is wrong length in table GD_EMPLOYEE (1394)
All errors related to one table in the database.
Oh, no. I just changed names to FIREBIRD25, FIREBIRD3, FIREBIRD5... Let us see where it leads now.
But, it won't affect network connections, right?
No, only XNET. Keep in mind that for a client to now be able to connect to the server with XNET, the firebird.conf of the client library (i.e. located in the same directory as fbclient.dll used by your application) must also contain the right IpcName
setting.
IBConfigService.ServerName := edServer.Text; if IBConfigService.ServerName > '' then IBConfigService.Protocol := TCP else IBConfigService.Protocol := Local;
in our case edServer.text contains
localhost/3056
Thanks, but...how IBConfigService is related with restore task ? Please, don't add more confusion than we already have.
Also, we need to know value of DatabaseName
property (at least).
It would be easier to check whether the server crashed or not if appropriate records of server starting and server properly shutting down were put into the firebird.log file.
If Firebird run as a service, look into Windows Event Log. Also, it is good idea to always have WER turned ON to collect crash dumps.
And regarding failed gfix. The firebird.log now contains thousands of records like:
XXXX Tue Jan 2 13:16:27 2024 Database: XXXX Error: Record 3 is wrong length in table GD_EMPLOYEE (1394)
All errors related to one table in the database.
If restore was not completed like yours, then it is not surprising that gfix shows errors. Here you should rather look at the errors in firebird.log and the log of restore itself (gbak -v -y <log_file>
).
You are restoring the database, but where are you restoring it from? What ODS was the backup made from?
The database was backed up on FB 3 server and restored on FB 5.
Is restore with gbak ok? Can you share core dumps of crashed server?
There are no suspicious records in the Windows's logs. So, I assume that FB5 service didn't crash.
Do error happen when all servers but v5 are stopped?
Do error happen when all servers but v5 are stopped?
cann't check right now. those servers being used.
well, I'm trying to run from the command prompt and the command just hangs. no CPU activity, no records in the firebird.log file.
C:\Program Files\FB5>gbak -r "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb" -user sysdba -pas xxx
-v -y "G:\Bases\Broiler\restore.log" -z
At this point you can attach debugger to it to see what's happening.
well, I'm trying to run from the command prompt and the command just hangs. no CPU activity, no records in the firebird.log file.
C:\Program Files\FB5>gbak -r "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb" -user sysdba -pas xxx -v -y "G:\Bases\Broiler\restore.log" -z
This command line not uses services and thus not equal to the app case. It should run in embedded mode, did you check CPU usage of whole system or of Firebird process only ? In second case, it is expected to not use CPU.
Could you run quick check with metadata-only restore using services, instead ? Like:
gbak -se localhost/3056:service_mgr -r -m -v "K:\Bases\xxx.bk" "g:\Bases\xxx.fdb"
restoring of metadata goes without any problems:
gbak:gbak version WI-V5.0.0.1304 Firebird 5.0 RC 2
gbak:use up to 8 parallel workers
gbak:transportable backup -- data in XDR format
gbak: backup file is compressed
gbak:backup version is 10
gbak:created database g:\Bases\xxx.fdb, page_size 8192 bytes
gbak:started transaction
...
gbak:adjusting views dbkey length
gbak:updating ownership of packages, procedures and tables
gbak:adding missing privileges
gbak:adjusting system generators
gbak: WARNING:function ABS is not defined
gbak: WARNING: module name or entrypoint could not be found
gbak: WARNING: function BIN_AND is not defined
gbak: WARNING: module name or entrypoint could not be found
gbak:finishing, closing, and going home
gbak:adjusting the ONLINE and FORCED WRITES flags
now, I will start restoring using gbak utility with -se switch.
It should run in embedded mode, did you check CPU usage of whole system or of Firebird process only ? In second case, it is expected to not use CPU.
whole system usage was around 1%. 10 minutes of no activity. I canceled it with Ctrl-C.
It should run in embedded mode, did you check CPU usage of whole system or of Firebird process only ? In second case, it is expected to not use CPU.
whole system usage was around 1%. 10 minutes of no activity. I canceled it with Ctrl-C.
And what was at "restore.log" ?
It should run in embedded mode, did you check CPU usage of whole system or of Firebird process only ? In second case, it is expected to not use CPU.
whole system usage was around 1%. 10 minutes of no activity. I canceled it with Ctrl-C.
And what was at "restore.log" ?
no restore.log, no database file was created on the disk. The utility just hangs and was hanging for all ten minutes.
Does it do the same if you try to restore to a different v5 server?
I downloaded the latest build 1305 with debug symbols, and, run it twice. The first time with procdump
attached to the service and the second one -- without procdump
. In both cases, the process of restoring ended without any error messages and the server didn't crash. But anyway, the subsequent run of gfix
utility shows that the newly restored database file is corrupted:
Summary of validation errors
Number of record level errors : 18722
BTW, gfix significantly pollutes firebird.log file. Im my case, every run adds 18 722 records and makes existing log file practically unusable. I would suggest that gfix
stops adding records from the single run after a reasonable amount. Say, add at max 10 records, and then write into the firebird.log file message "please, use -v -y
The error continues to occur in FB 5.0.1. For us, it is a great blocker as we stopped all works on upgrading customer's servers until resolving the situation.
1:53:33,42 C:\Program Files\FB5>gbak -r g:\XXXXXXX.bk g:\XXXXXXXX.fdb -user sysdba -pas XXXXXXXX -se localhost/3056:service_mgr -v > g:\XXXXXXXXX.txt
gbak: ERROR:Error reading data from the connection.
gbak:Exiting before completion due to errors
Provide reproducible test case, please
BTW, what is the first 10 and last 10 lines at g:\XXXXXXXXX.txt ?
@hvlad
gbak:opened file g:\XXXXXXXXXX.bk
gbak:use up to 8 parallel workers
gbak:transportable backup -- data in XDR format
gbak: backup file is compressed
gbak:backup version is 10
gbak:created database g:\XXXXXXXXXX.fdb, page_size 8192 bytes
gbak:started transaction
gbak:restoring domain DINTKEY
gbak:restoring domain DDBKEY
gbak:restoring domain DFOREIGNKEY
...
gbak: activating and creating deferred index AT_FK_GENERATORS_EDITORKEY
gbak: activating and creating deferred index RDB$FOREIGN89
gbak:adjusting views dbkey length
gbak:updating ownership of packages, procedures and tables
gbak:adding missing privileges
gbak:adjusting system generators
gbak: WARNING:function ABS is not defined
gbak: WARNING: module name or entrypoint could not be found
gbak: WARNING: function BIN_AND is not defined
gbak: WARNING: module name or entrypoint could not be found
gbak:finishing, closing, and going home
I downloaded the latest build 1305 with debug symbols, and, run it twice. The first time with
procdump
attached to the service and the second one -- withoutprocdump
. In both cases, the process of restoring ended without any error messages and the server didn't crash. But anyway, the subsequent run ofgfix
utility shows that the newly restored database file is corrupted:Summary of validation errors Number of record level errors : 18722
Could you show corresponding messages put by validation into firebird.log ?
After the message 'finishing, closing, and going home' there should be 'adjusting the ONLINE and FORCED WRITES flags'. Looks like something bad happens when disconnecting from just restored database. Could you also show gstat -h output for such database (it should show shutdown mode as 'multi-user maintenance') ?
@hvlad
Gstat execution time Mon Mar 18 16:29:38 2024
Database header page information:
Flags 0
Generation 13247
System Change Number 0
Page size 8192
ODS version 13.1
Oldest transaction 3294
Oldest active 3295
Oldest snapshot 3295
Next transaction 13226
Sequence number 0
Next attachment ID 15
Implementation HW=AMD/Intel/x64 little-endian OS=Windows CC=MSVC
Shadow count 0
Page buffers 500
Next header page 0
Database dialect 3
Creation date Mar 17, 2024 22:56:46
Attributes multi-user maintenance
Variable header data:
Database GUID: {5FC2040A-4EC7-4E3C-8732-6FCBA1318CC6}
Sweep interval: 0
*END*
Gstat completion time Mon Mar 18 16:29:38 2024
The error continues to occur in FB 5.0.1. For us, it is a great blocker as we stopped all works on upgrading customer's servers until resolving the situation.
Try to use gbak without -se switch and check if the problem persist.
The error continues to occur in FB 5.0.1. For us, it is a great blocker as we stopped all works on upgrading customer's servers until resolving the situation.
Try to use gbak without -se switch and check if the problem persist.
He means - run embedded (!) restore and use procdump to watch for supposed crash.
Not necessarily embedded, locahost attachment is fine. The key is not to use services: I guess if this problem may be related to Firebird server crash on service disconnect which I've once seen.
I downloaded the latest build 1305 with debug symbols, and, run it twice. The first time with
procdump
attached to the service and the second one -- withoutprocdump
. In both cases, the process of restoring ended without any error messages and the server didn't crash. But anyway, the subsequent run ofgfix
utility shows that the newly restored database file is corrupted:Summary of validation errors Number of record level errors : 18722
If you run procdump
against Windows service process - how do you sure procdump
have enough privileges to watch for the process and create dump file ? Also, it is easy to check Windows Event log if serivce was crashed and restarted.
Not necessarily embedded, locahost attachment is fine. The key is not to use services: I guess if this problem may be related to Firebird server crash on service disconnect which I've once seen.
How it will help us to catch the crash ?
How it will help us to catch the crash ?
It would narrow borders for testcase creation. If problem is related to TCP server, it technically cannot be reproduced in embedded mode.
relevant records from firebird.log:
LEPEL Mon Mar 18 04:30:58 2024
INET/inet_error: read errno = 10054, server host = localhost, address = ::1/3056
LEPEL Mon Mar 18 16:32:12 2024
Database: G:\XXXXX.FDB
Validation started
LEPEL Mon Mar 18 17:18:06 2024
Database: G:\XXXX.FDB
Validation finished: 0 errors, 0 warnings, 0 fixed
There are multiple Firebird instances on the server: FB25, FB3, and FB5. Each instance is assigned a dedicated port and an appropriate service name. When using a third-party application to perform a restore, it connects to the server via a TCP connection using the connection string
localhost/3056:some_path_to_database
.At the end of the restoration process, the following error message appears:
Additionally, there is a record in the firebird.log file:
The resulting database file appears to be corrupted, and subsequent
gfix -v -full ...
shows: