Closed firebird-automations closed 13 years ago
Commented by: @dyemanov
This is the second report of such a problem already. It seems we have a regression.
Commented by: @dyemanov
What v2.5.1 build have you used priorly, that didn't have this issue?
Commented by: vander clock stephane (arkadia)
Hard to say, in the 2.5 original i also have this kind of issue (but much much more often, but not sure it's the same). I try too little time the beta release between to 2.5 original and the 2.5 version 2630 to say with don't have this issue. what i can say is that this issue in the 2630 don't appear so often as in the 2.5 original (but on the other way the FB 2.5 original crash close to every days and the 26230 never crash so hard to know)
i don't know, if you want i can try to attach a debuger on the fb_inet_server.exe to see what's happen ?
Commented by: vander clock stephane (arkadia)
I forget, just say me if i can kill the "frozen" clients or if you need them to do some tests ...
Commented by: vander clock stephane (arkadia)
the same error again ....
today the behavior is little different
one client was running several day with 2 connections on the FB server 1 waiting some EVENT; 1 doing some queries;
yesterday the client become frozen. when i look in the $mon table in the server side i see only the EVENT connection. so it's mean that the client was disconnected (but why? as the event is still connected it's not look like a network problem) but the client not detect it and continue to wait undefinitivelly the answer of the server
this bug become very important because it's block all the time our application :( do you think i can take a newer version than the FB 26230 ?
Commented by: vander clock stephane (arkadia)
today a little different,
i have on the server
Thu Mar 24 11:07:35 2011 INET/inet_error: read errno = 10054
and on the client
Unable to complete network request to host "xxx". Error reading data from the connection. Unsuccessful execution caused by a system error that precludes successful execution of subsequent statements Error Code: 401
BUT i have several other application connecting to the FB server on the same client machine that are still working correctly !
this error (10054 = A connection was forcibly closed by a peer.) look like morea bug than a network problem
Commented by: vander clock stephane (arkadia)
What the status of this bug ? it's appear in average one time a week but it's very strong because when it's happen it's "froze" the client Application :(
more info (probably it's repeat but can not be bad in case i forget something) :
We have one client application that run at full time in background.
This client open 2 connections on the database, one for query and one for select/update
Every second this client application do a query on the database (select + insert/update). so the connection is never "idle" more than 1 second. but after days of work (1, 2, 7, etc..) then the connection freeze waiting the answer of the server that never arrive.
on the server side (when it's freeze), i can only see (in the $MON table) the connection of the EVENT, but not the second connection that do Select/update. The $Mon table say that the connection of the event it's open from the 3/29/2011
in the log of the client application i see that the second connection is frozen from the 04-04-11 14:43:19 in the Server side, in the firebird.log i can see this :
FBSERVER Mon Apr 04 14:43:48 2011 INET/inet_error: read errno = 10054
any idea ? did need to update the current version of the FBServer and FB client DLL ? actually client = legacy 2.5 and server = FB 26230
Commented by: @dyemanov
Different solutions are being tested on the customer site who has it more or less reproducible. I will report here as soon as we have something concrete found.
Commented by: vander clock stephane (arkadia)
thanks dmitry, as soon you have something, you can come also to me to test it ...
Commented by: @hvlad
The fix is committed (into v2.5 source tree), please try the next snapshot build
Commented by: vander clock stephane (arkadia)
thanks Vlad !
but in the meantime Dmitry send me 3 differents versions to test. I actually test them now, and the 1rt version still have the bug, i m actually testing the second... did i meed to stop my test and use instead the next snapshot build ?
stephane
Commented by: @dyemanov
The issue should be resolved in the snapshot build, so it's worth trying it now. If it still locks up, you can return to my builds :-)
Commented by: vander clock stephane (arkadia)
bad new, unfortunatly it's seam to not be resolved, i just have now a client that is waiting for already 2 hours the answer of the server and by this way enter in a "frozen" state :( on the server side i can see one EVENT connection with the client, but no any other connection (i look in the $Mon table).
on the server i just have something like this at the time the client start to wait the server answer : Mon Apr 25 17:04:25 2011 INET/inet_error: read errno = 10054
FB version 26263 ... any idea ?
Commented by: vander clock stephane (arkadia)
i confirm, now 24 hours passed the the client is still frozen :(
Commented by: @dyemanov
What do you mean by the "EVENT connection"? Do you set up a separate connection to wait for events and this connection is not used for any client queries?
Commented by: vander clock stephane (arkadia)
at the begining, when the client is launch 2 connections are open to the server : 1 Event connection to listen to any database server event (mostly not used, not too much event are raised) ; 1 Query connection to select, update, etc... ;
here we see that the query connection is lost (not present anymore in the $Mon table) BUT the Event connection is still show in the $mon table and still aleave ! this mostly show that the probleme look like not a "hardware network" probleme.
it's not really hard to reproduce this bug (on our database), it's appear after 1/2 days of running
if you want i can make a dump of the client to know what he is waiting for ?
Commented by: @dyemanov
I'm not sure the dump would tell us something useful, but if it's possible for you then please proceed and send it to me by email (or make available for download and send me the link). BTW, have you looked what netstat reports on both endpoints of the connection? Are the network connections alive or not?
Commented by: vander clock stephane (arkadia)
I make a little mistake, when i try to launch the debuger to make the dump i close the client by mistake :( doesn't matter, i run it now again and wait for the next mistake and i will see the netstat at this time
Commented by: vander clock stephane (arkadia)
Hello everyone,
Now i have the confirmation that this bug is still present :( FB version 26263
One of my client application if waiting the serverr answer for now 2 days. this time i setup more log in my client application and i can say that
* The client open 2 connections, one for event and one for queries * the client work now for several days (between 1 and 7) without any problem * and The client Start a new transaction (OK) * The client call just after DSQLExecuteImmediate <= here the problem is, server never answer, client wait for unlimited time
the exact SQL Query was [05-07-11 18:47:42][ThreadId: 2136] Insert Into transac_comment (id_transac, Comment_Type, Language, Comment) Values ('VBOQ-T1885',3,'FRA','Ce bel appartement de 2 chambres se trouve sur la première Étage d''un immeuble bien entretenu avec un total de 3 unités d''habitation et la maison idéale pour une seule nouvelle ou une belle paire de tous les âges. <br /><br /> L''appartement ensoleillé est réparti sur 65 mètres carrés et fonctionne grâce à la nouvelle de nombreux ...')
NB: this SQL is valid, just to show to you in case
on the server side i can see in the $Mon table the Event connection, but not the Queries connection
doing netstat on the client computer show this :
Active Connections
Proto Local Address Foreign Address State PID [WinRETranslator.exe] TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132 [WinRETranslator.exe] TCP 94.210.76.118:50791 94.210.76.120:3050 ESTABLISHED 3588
NB: i remove here the connection not related to WinRETranslator.exe (pur client application) in this report
doint netstat on the server , i can found only thes row connected to our client application:
Active Connections
Proto Local Address Foreign Address State TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:50791 ESTABLISHED
so i can not found on the server the line for
TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132
thanks for you help !
Commented by: vander clock stephane (arkadia)
sorry i make a litte error in pasting the netstat result
doing netstat on the client computer show this :
Active Connections
Proto Local Address Foreign Address State PID TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132
NB: i remove here the connection not related to WinRETranslator.exe (pur client application) in this report
doint netstat on the server , i can found these row connected to our client application:
Active Connections
Proto Local Address Foreign Address State TCP 94.210.76.120:3050 94.210.76.118:49164 ESTABLISHED TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED
so i can not found on the server the line for
TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132
stephane
Commented by: @hvlad
Stephane, please explain : - you said "The client open 2 connections, one for event and one for queries". It means client should have 3 sockets opened : 2 with remote port 3050 for database connection and one with different port - for events processing - but you show 4 sokets opened by client :
TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132
and 3 of them is for regular database connection.
Commented by: vander clock stephane (arkadia)
hmm, i look the code and it's possible that more than 2 connections are open for the queries. in fact another connection can be open for "log" that simply store in a table all the queries executed by the first connection.
Commented by: vander clock stephane (arkadia)
ok, 5 days passed and the client is still waiting the server answer :(
net stat still show this in the client :
Proto Local Address Foreign Address State PID TCP 94.210.76.118:49164 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 TCP 94.210.76.118:49166 94.210.76.120:49261 ESTABLISHED 2132 TCP 94.210.76.118:49167 94.210.76.120:3050 ESTABLISHED 2132
and in the server still the same
Proto Local Address Foreign Address State TCP 94.210.76.120:3050 94.210.76.118:49164 ESTABLISHED TCP 94.210.76.120:49261 94.210.76.118:49166 ESTABLISHED TCP 94.210.76.120:3050 94.210.76.118:49167 ESTABLISHED
so it's really strange that the client show TCP 94.210.76.118:49165 94.210.76.120:3050 ESTABLISHED 2132 as established but on the server side their is nothing about it :(
i was waiting few days hopping a time out or something like this will close the connection but no :(
Commented by: @dyemanov
what about the memory dump for the client app? Also, does number of fb_inet_server processes correspond to select count(*) from mon$attachments?
Commented by: vander clock stephane (arkadia)
dammed ... i just close today the client app before reading your message :( but i thing i have another client application that is in the same situation, i will try to see if i can do a memory dump of it
> Also, does number of fb_inet_server processes correspond to select count(*) from mon$attachments?
i m in Firebird super classic, how to see that ?
Commented by: @dyemanov
Ah, sorry, I thought you were using Classic.
Commented by: vander clock stephane (arkadia)
ok, i have another client application that is frozen for 8 days too in the same situation. so i just create a dump, and put it in the same server i send you the last time in private message. if you don't find it, i can send the login password again to you..
thanks !
Commented by: vander clock stephane (arkadia)
Actually i m testing the new version of FBClient, and it's look fine for now (but need one or two week to know for sure)
but i just want to say also that 4 days ago, i was force to hardly kill some clients (because of server overload). i just today see that their connection is still present in the $MON table but the client is dead for several days now ! Some of this connection have even an active status...
in the $MON table i can see 221 connections, and doing NetStat i can see around 286 established connections
So just to say that the trouble seam to not be only in the fbclient.dll side but also in the server side ...
Commented by: vander clock stephane (arkadia)
The new version of FBclient that dmitry send me look like it's correct the bug ! from the time i install them (more than 2 weeks ago now) i don't have any more any frozen client !
congratulation dmitry and many thanks !
i thing we can close this bug as resolved now ! also when the modification will be commited in the daily snapshot ?
thanks again stéphane
Commented by: @dyemanov
Thanks for the good news. The fix will be committed today and thus will be available in the tomorrow's snapshot build.
Commented by: @dyemanov
While fixing CORE1763, it was decided to not set the SO_KEEPALIVE flag for the client socket. But the lack of keep-alive packets is now proved causing problems in this ticket, supposedly due to some networking (or firewall related) troubles in the production environment. The solution is to turn keep-alive packets on.
Commented by: vander clock stephane (arkadia)
one thing, is the Gbak use the Fbclient.dll or it's own connection protocol ? because i also have one backup that is frozen for several days ... but i guess that gbak not use fbclient.dll but that the bug in gbak is simillar to the bug in the fbclient.dll ?
Commented by: @dyemanov
GBAK is a regular client application, i.e. it uses fbclient.dll. Perhaps this is a different issue.
Commented by: vander clock stephane (arkadia)
sorry for late answer, was in travel.. hmm, that not sound good because it's pretty like the same issue :(
on the client where gtat run net stat show : TCP 94.210.76.118:53154 94.210.76.120:3050 ESTABLISHED [gbak.exe]
on the server nada :( i can not find any connection on the port 53154 :( in the $Mon table also, nothing about the gbak
the gbak is frozen from the 24 may ... when it froze it's was exactly at this state :
gbak:25900000 records written gbak:25920000 records written
i just make a dump of the gbak and put it in the ftp in case you need...
thanks !
Commented by: vander clock stephane (arkadia)
another remark:
on the firebird server i can see 4 EVENT connections alive that was open by client that are gone serveral week ago ! i see theses connections in the $Mon table, i don't know how to recognise them in the net stat table
These connections was not open with client that have the new FBClient.dll (don't know if matter, because here it's on the server side)
the server version is still the 26230
Commented by: vander clock stephane (arkadia)
hello,
I have now often the GBAK that freeze (one time a week). but except the GBAK i have much much (much) lower connection frozen bug than before !! do you thing that using the fb release canditate 1 can resolve this issue ? i still use the the server version is 2.5.1.26263
gbak is still connected now (from several days) if you want i can also do a memory dump (or something else?)
thanks !
Commented by: @dyemanov
It's worth trying v2.5.1 RC1 regardless. A memory dump (ideally, for both frozen GBAK and the server it's supposedly connected to) is appreciated, as well as the network stats (also, for both server and client) for the moment.
Commented by: vander clock stephane (arkadia)
ok, in more details :
on the Gbak computer the netstat show :
TCP 93.122.12.118:52939 93.122.12.120:3050 ESTABLISHED [GBAK.EXE]
on the firebird server computer ... nothing ! the gbak client is not in the $MON table nor in the netstat !
so it's simply look like that GBAK don't detect the deconnection from the server and stay connected forever :(
i put the copy of the Gbak Dump file n the ftp server with i send you the credential by email the last time ...
thanks !
Submitted by: vander clock stephane (arkadia)
Is related to CORE1763
hello,
i have several client that are running for some days now waiting that the server answer to them. so at the end they look like "frozen"!
in the server side, in the $MON table i can see that the client are well attached from the 3/11/2011 (so 5 days) but the server don't answer them ! client are waiting for 5 days now one answer from the server that never arrive ...
the server version is 26230. the client use the original 2.5 client DLL
i still not kill the client (nor the server too) so i can still do some stuff on it if you need (like full dump?)
thank by advance
Commits: FirebirdSQL/firebird@070718bea29c6d6fa8ea4c0145f051a9ffaae17d FirebirdSQL/firebird@e04bf21073893f2e747a7d76d2cfe392fd5d5df9 FirebirdSQL/firebird@9087961c1517a602dc721a4478710575fca74b12