FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.19k stars 205 forks source link

100% CPU USAGE (endless loop) in the remote protocol code related to events processing [CORE3119] #3497

Closed firebird-automations closed 13 years ago

firebird-automations commented 13 years ago

Submitted by: vander clock stephane (arkadia)

Attachments: ALFBXEvent.zip firebird.rar

Votes: 1

theses bug are really hard to reproduce or to understand what make them happen. i can only say what we see

The database server stop to answer all the clients. in the firebird.log we have this

DATABASESERVER Sun Aug 29 04:53:57 2010 INET/inet_error: read errno = 10054 DATABASESERVER Sun Aug 29 04:56:59 2010 INET/inet_error: read errno = 10054 DATABASESERVER Sun Aug 29 04:58:53 2010 INET/inet_error: read errno = 10054 DATABASESERVER Sun Aug 29 05:01:27 2010 INET/inet_error: read errno = 10054 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504

... and like this for more than 19 go ! the firebird.log was always growing by adding all the time these lines :

DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504

even after we close/kill all the client connected to the server ! we was force to stop hardly the firebird process ...

after launch a Gstat on the database, we see that lot of index was corrupted (around 10) in different tables

Actually it's still impossible to run the firebird server for more than 2 weeks without having a probleme that in all case result in a corrupted database...

Commits: FirebirdSQL/firebird@1e35bc97c8cf704900c63480f63d3a1a6048d246 FirebirdSQL/firebird@90b88fdec327a1e58dba086f2c2e89c6a0ea58b5 FirebirdSQL/firebird@b48821ac022eeeaa2c70865255639666bf2db952

firebird-automations commented 13 years ago

Commented by: @hvlad

INET/inet_error: accept errno = 10038

>>>> MSDN WSAENOTSOCK 10038

Socket operation on nonsocket.

An operation was attempted on something that is not a socket\. Either the socket handle parameter did not reference a valid socket, or for select, a member of an fd\_set was not valid\. 

>>>> MSDN

As error was found at call of accept() then we have bad listener socket. Don't ask me why and how it became wrong. Firebird able to detect such condition and to remove bad socket from internal list (correctly closing connection of course). Therefore next message :

INET/select_wait: found "not a socket" socket : 504

504 is a numeric value of bad socket.

But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process". Anyway, i would like to look at that part of firebird.log with corruption errors.

firebird-automations commented 13 years ago
Modified by: @hvlad assignee: Vlad Khorsun \[ hvlad \]
firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

thanks Vlad,

>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

great !

>> As for corrupted indices - we know you have a lot of indices so no wonder some of them was corrupted after "stop hardly the firebird process".

yes it's possible, but the corrupted index is often on our database (every 2/3 weeks). the problem is that to check the database we must fully stop the server to run the gstat and gstat take few hours all the time to run. so most of the time we detect the corrupted index when the server is "over" and no other choice that fully stop our services, and we use this time to run gstat ...

>> Anyway, i would like to look at that part of firebird.log with corruption errors. aie, the firebird.log was so big after this bug that we was forced to delete it. but all the row in it was the same, because firebird server was always adding theses rows :

DATABASESERVER Sun Aug 29 05:02:59 2010 INET/inet_error: accept errno = 10038 DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504

but after kill the firebird process (by stopping the service) and restart it, the firebird was working ok !

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

>> But this ability seems not ready to deal with listener socket (all known to me cases was about worker sockets) and bad socket not removed from list and network server enters and endless loop. This is error i going to fix.

Is this fix in the last release of Firebird 2.5 ?

firebird-automations commented 13 years ago

Commented by: @hvlad

Fix is still not implemented, sorry. Is it bother you regularly ?

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

Thanks Vlad,

Yes, it's crash again just this morning. i thing that we can say it's happen 1 time a month in average, but when it's happen everything is down :(

this morning i have

DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 508

and last time it's was

DATABASESERVER Sun Aug 29 05:02:59 2010 INET/select_wait: found "not a socket" socket : 504

but except this (508 instead of 504) same scenario : very big firebird.log file growing and growning

firebird-automations commented 13 years ago

Commented by: Artem Kuzmenko (artyom-ace)

I have crash today with this bug. Log size and content surprise me! I attach log to message. DB after crash don't have a bugs. I stop server by Firebird Server Control, but started only after reboot. OS. Win 2003 R2 Enterprise SP2 32bit Firebird 2.5.0.26054, default install P.S. all DB use "execute statement on external" inside this server ...

firebird-automations commented 13 years ago
Modified by: Artem Kuzmenko (artyom-ace) Attachment: firebird\.rar \[ 11791 \]
firebird-automations commented 13 years ago

Commented by: Artem Kuzmenko (artyom-ace)

I try by oneself find regularity. I find it. On my Win2003R2 (I install lastest version) Firebird 2.5.0.26074 server contain 3 DB with ODS 11.2. Server have many outer connection. And if one outer PC with firebird 2.1 (2.1.3.18185) connected to firebird 2.5 server, all ok, if outer PC with firebird 2.1 2 and more I have this crash.

Rus: Как смог так и описал на английском, повторю на русском.

Сервер с установленным Firebird 2.5.0.26074 содержит базы с ODS 11.2 + используют внутри данного сервера "execute statement on external", на всякий случай привожу это вдруг это важно. Если внешнее TCP соединение приходит от компа с установленным firebird 2.1 то как правило это соединение проходит и все ок (даже если несколько программ на этом компьютере, в моем случае 3 нормально работали), как только имею 2 и более соединения с сервером 2.5 с разных клиентских машин где стоит 2.1 начинаются глюки, или намертво виснет клиентское приложение (в лучшем случае) или падает с данной ошибкой сервер 2.5.

Ну это мои наблюдения, надеюсь это быстро поможет устранить данную досадную ошибку. Т.к. база на 2.5 рабочая и откатиться к 2.1 уже возможности нет :(

firebird-automations commented 13 years ago

Commented by: @hvlad

Artem,

feel free to contact me privately to figure out all details

firebird-automations commented 13 years ago

Commented by: @hvlad

Stephane, Artem,

answer few questions, please:

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? b) how many connections established at time when error happens ? c) could you run netstat -p tcp -n at time when error happens and post results here ?

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? => NO, absolutely nothing, windows 2008 R2 64 bit

b) how many connections established at time when error happens ? => i don't really know, but around 100 ?

c) could you run netstat -p tcp -n at time when error happens and post results here ? => i will wait the next time the error happen and do it

firebird-automations commented 13 years ago

Commented by: @hvlad

and one more question: d) do you have connections using "localhost ", i.e. local TCP connections ?

firebird-automations commented 13 years ago

Commented by: Artem Kuzmenko (artyom-ace)

Last fiew days I try to provoke a bug. On working system (where it's hapen regularly) all firebird reinstall up to last version. I don't have a choice.

I create Bug Generator :) : 4 virtual mashines with OS, Prog and attribute as at old working system. But without effect so far :(

a) do you have any antivirus or firewall software installed at the host where Firebird server is running ? => Have installed Kaspersky 6 for Server. Gug happen with on and off kaspersky. But it not uninstalled yet.

b) how many connections established at time when error happens ? d) do you have connections using "localhost ", i.e. local TCP connections ? => around 10 on working system. But on my notebook, where I Develop my soft, yesterday firebird down whish this bug localy! (first time, log saved). firebird haved few droped connections and may be one normal. Server down only in moment when i try connect to db. Interesting that log grow up speed proportionally CPU speed.

On my notebook installed KIS9 but it work only when i start it manualy. As usual it off.

When I can stable generate bug or if find new fact I immediately inform you.

firebird-automations commented 13 years ago

Commented by: @hvlad

Artem, are you still trying to reproduce it ?

firebird-automations commented 13 years ago

Commented by: Artem Kuzmenko (artyom-ace)

Sorry to many work :( Few times I try to reproduce bug on 5 VMWare virtual mashines but without effect :( In my company after reinstalled all client and server to last fb 2.5 I don't see this error.

I Still dependence that guilty of bug is connection from fb 2.0 or 2.1 installed on client ...

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

dear vlad,

hmmm, it's a lot of time that this bug not appear ... these kind of bug are very very hard to track. actually i m fighting with windows to be able to have a dump when the firebird process crash. i found a way, so probably i will write it somewhere is someone else need to do it ?

firebird-automations commented 13 years ago

Commented by: @hvlad

Stephane,

of course, it could be helpful for others if you found a way to produce crash dumps :) BTW, if you have such dump - send it to me, please (or make available for download)

firebird-automations commented 13 years ago

Commented by: Artem Kuzmenko (artyom-ace)

Yes! I Did it!!! I can crash system with this bug at any time. Please tell me what I have to do that you have maximum info about bug step by step.

It's happens when my prog connect to 3 DB on server with FB25 from clients mashine on 3 step: 1. Run prog and connect from 2.5 client - ok 2. Run prog and connect from 2.5 client - ok 3. Run prog and try connect from 2.1 client - prog stick (may be few times) 4. ... few attempt run prog and connect from 2.1 client and server crash.

firebird-automations commented 13 years ago

Commented by: @hvlad

Artem,

could you send me by e-mail all necessary files (program and db) with instructions how to reproduce bug, please ? Or make it awailable for download and send me URL

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

I DO IT TOO !!! but in different way more easy i thing :)

I install the last version of FB 2.5 on the server. on the client the last version of the FB 2.5 fbclient DLL too (so it's not connected to the version of the DLL)

Important: on the server i set the firewall ON except for the port 3050 of firebird (This to block the port used by the event)

and after easy, on the client side i simply launch an "Event" listener process :) wait 1 or 2 connecting error and the fbserver start to take 100% of the CPU and wite in loop in the firebird.log !

This was not the condition it's was on our production server (because on it the firewall is open for the event) but it's a 100% working way to simulate the bug !

attached find my software demo compiled (in delphi) of an event listener Application. very easy to setup :)

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

the demo application to create an event listener thread

the code source :

///////////////////////////// ///// TALFBXEventThread ///// /////////////////////////////

{********************************************************} {!!we guess that this procedure will be not multithread!! but we have a strange bug when Fsignal is TEvent, when we disconnect the FBserver, them an EaccessViolation in ntdll is raise in the waitfor in the execute function} procedure ALFBXEventCallback(UserData: Pointer; Length: Smallint; Updated: PAnsiChar); cdecl; begin if (Assigned(UserData) and Assigned(Updated)) then begin with TALFBXEventThread(UserData) do begin if FEventCanceled then begin SetEvent(FSignal); Exit; end; Move(Updated^, fResultBuffer^, Length); FQueueEvent := True; SetEvent(FSignal); end; end else begin //if Updated = nil then it's look like it's an error //like connection lost for exemple or a call to EventCancel with TALFBXEventThread(UserData) do begin if FEventCanceled then begin SetEvent(FSignal); Exit; end; FQueueEvent := False; SetEvent(FSignal); end; end; end;

{***************************************************} procedure TALFBXEventThread.initObject(aDataBaseName, aLogin, aPassword, aCharSet: String; aEventNames: String; aConnectionMaxIdleTime: integer; aNumbuffers: integer; aOpenConnectionExtraParams: String); Var aLst: TStrings; i: integer; begin //if we put lower than tpNormal it seam than the <//EventThread.Free> will never return ! //Priority := tpNormal; FreeOnTerminate := False; FConnectionMaxIdleTime := aConnectionMaxIdleTime; if FConnectionMaxIdleTime <= 0 then FConnectionMaxIdleTime := INFINITE; FDBHandle := nil; FQueueEvent := False; fResultBuffer := Nil; FSignal := CreateEvent(nil, true, false, ''); fcompleted := False; fStarted := False; FEventCanceled := False; FWaitingSignal := False; FDataBaseName:= aDataBaseName; FCharset:= ALFBXStrToCharacterSet(aCharSet); fOpenConnectionParams := 'user_name = '+aLogin+'; '+ 'password = '+aPassword+'; '+ 'lc_ctype = '+aCharSet; if aNumbuffers > -1 then fOpenConnectionParams := fOpenConnectionParams + '; num_buffers = ' + inttostr(aNumbuffers); if aOpenConnectionExtraParams <> '' then fOpenConnectionParams := fOpenConnectionParams + '; ' + aOpenConnectionExtraParams; aLst := TstringList.Create; Try Alst.Text := Trim(alStringReplace(aEventNames,';',#⁠13#⁠10,[rfReplaceALL])); i := 0; while (i <= 14) and (i <= Alst.Count - 1) do begin fEventNamesArr[i] := Trim(Alst[i]); inc(i); end; fEventNamesCount := i; while i <= 14 do begin fEventNamesArr[i] := ''; inc(i); end; Finally Alst.Free; End; end;

{*************************************************} constructor TALFBXEventThread.Create(aDataBaseName, aLogin, aPassword, aCharSet: String; aEventNames: String; // ; separated value like EVENT1;EVENT2; etc... aApiVer: TALFBXVersion_API; const alib: String = GDS32DLL; const aConnectionMaxIdleTime: integer = -1; const aNumbuffers: integer = -1; const aOpenConnectionExtraParams: String = ''); begin fLibrary := TALFBXLibrary.Create(aApiVer); fLibrary.Load(alib); FownLibrary := True; initObject(aDataBaseName, aLogin, aPassword, aCharSet, aEventNames, aConnectionMaxIdleTime, aNumbuffers, aOpenConnectionExtraParams); inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010 end;

{*************************************************} constructor TALFBXEventThread.Create(aDataBaseName, aLogin, aPassword, aCharSet: String; aEventNames: String; // ; separated value like EVENT1;EVENT2; etc... alib: TALFBXLibrary; const aConnectionMaxIdleTime: integer = -1; const aNumbuffers: integer = -1; const aOpenConnectionExtraParams: String = ''); begin fLibrary := alib; FownLibrary := False; initObject(aDataBaseName, aLogin, aPassword, aCharSet, aEventNames, aConnectionMaxIdleTime, aNumbuffers, aOpenConnectionExtraParams); inherited Create(False); // see http://www.gerixsoft.com/blog/delphi/fixing-symbol-resume-deprecated-warning-delphi-2010 end;

{********************************************} procedure TALFBXEventThread.AfterConstruction; begin inherited; while (not fStarted) do sleep(10); end;

{***********************************} destructor TALFBXEventThread.Destroy; begin

//first set terminated to true If not Terminated then Terminate;

//in case the execute in waiting fire the Fsignal while (not fWaitingSignal) and (not fCompleted) do sleep(10); if (not fCompleted) then setEvent(FSignal); while (not fCompleted) do sleep(10); //sleep(100); => i don't know the purpose of this so i comment it !

//close the fSignal handle CloseHandle(FSignal);

//free the library if FownLibrary then fLibrary.Free;

//destroy the object inherited;

end;

{**********************************} procedure TALFBXEventThread.Execute; var aEventBuffer: PAnsiChar; aEventBufferLen: Smallint; aEventID: Integer; aStatusVector: TALFBXStatusVector;

\{\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\-\}
Procedure InternalFreeLocalVar;
Begin
  //free the aEventID
  if aEventID <\> 0 then begin
    FEventCanceled := True;
    Try
      ResetEvent\(Fsignal\);
      FLibrary\.EventCancel\(FDbHandle, aEventID\);
      //in case the connection or fbserver crash the Fsignal will
      //be never signaled
      WaitForSingleObject\(FSignal, 60000\);
    Except
      //in case of error what we can do except suppose than the event was canceled ?
      //in anyway we will reset the FDbHandle after
    End;
    FEventCanceled := False;
  end;
  aEventID := 0;

  //free the aEventBuffer
  if assigned\(aEventBuffer\) then begin
    Try
      FLibrary\.IscFree\(aEventBuffer\);
    Except
      //paranoia mode \.\.\. i never see it's can raise any error here
    End;
  end;
  aEventBuffer := nil;

  //free the FResultBuffer
  if assigned\(FResultBuffer\) then begin
    Try
      FLibrary\.IscFree\(FResultBuffer\);
    Except
      //paranoia mode \.\.\. i never see it's can raise any error here
    End;
  end;
  FResultBuffer := nil;

  //free the FDBHandle
  if assigned\(FDBHandle\) then begin
    Try
      FLibrary\.DetachDatabase\(FDBHandle\);
    Except
      //yes the function before can do an exception if the network connection
      //was dropped\.\.\. but not our bussiness what we can do ?
    End;
  end;
  FDBHandle := Nil;

  //ok, if we remove the instruction below then sometime, when we close
  //the program we can have an eAcessViolation\. to see it simply run
  //a program to run and imediatly close and have some delay/sleep
  //in other unit \(3seconds it's enalfe\)\. Run Winreguardian \-nothingtolaunch
  //for exemple
  //sleep\(100\);
End;

var aCurrentEventIdx: integer; aMustResetDBHandle: Boolean; begin //to be sure that the thread was stated fStarted := True;

aEventBuffer := nil; aEventID := 0; aEventBufferLen := 0; aMustResetDBHandle := True;

while not Terminated do begin Try

  //if the DBHandle is not assigned the create it
  //FDBHandle can not be assigned if for exemple
  //an error \(disconnection happen\)
  if aMustResetDBHandle then begin

    //set the FMustResetDBHandle to false
    aMustResetDBHandle := False;

    //free the local var
    InternalFreeLocalVar;

    //First init FDBHandle
    FLibrary\.AttachDatabase\(FDataBaseName,
                            FDBHandle,
                            fOpenConnectionParams\);

    //register the EventBlock
    aEventBufferLen := FLibrary\.EventBlock\(aEventBuffer,
                                           fResultBuffer,
                                           fEventNamesCount,
                                           PAnsiChar\(fEventNamesArr\[0\]\),
                                           PAnsiChar\(fEventNamesArr\[1\]\),
                                           PAnsiChar\(fEventNamesArr\[2\]\),
                                           PAnsiChar\(fEventNamesArr\[3\]\),
                                           PAnsiChar\(fEventNamesArr\[4\]\),
                                           PAnsiChar\(fEventNamesArr\[5\]\),
                                           PAnsiChar\(fEventNamesArr\[6\]\),
                                           PAnsiChar\(fEventNamesArr\[7\]\),
                                           PAnsiChar\(fEventNamesArr\[8\]\),
                                           PAnsiChar\(fEventNamesArr\[9\]\),
                                           PAnsiChar\(fEventNamesArr\[10\]\),
                                           PAnsiChar\(fEventNamesArr\[11\]\),
                                           PAnsiChar\(fEventNamesArr\[12\]\),
                                           PAnsiChar\(fEventNamesArr\[13\]\),
                                           PAnsiChar\(fEventNamesArr\[14\]\)\);

    //the First EventQueue
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);
    if WaitForSingleObject\(FSignal, 60000\) <\> WAIT\_OBJECT\_0 then raise Exception\.Create\('Timeout in the first call to isc\_que\_events'\);
    FLibrary\.EventCounts\(aStatusVector,
                         aEventBufferLen,
                         aEventBuffer,
                         fResultBuffer\);

    //set the FQueueEvent to false in case the next
    //WaitForSingleObject fired because of a timeout
    FQueueEvent := False;

    //the 2nd EventQueue
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);

  end;

  //if terminated then exit;
  if Terminated then Break;

  //set fWaitingsignal
  fWaitingsignal := True;

  //stop the thread stile a event appear
  WaitForSingleObject\(FSignal, FConnectionMaxIdleTime\); //every 20 minutes reset the connection

  //set fWaitingsignal
  fWaitingsignal := False;

  //if terminated then exit;
  if Terminated then Break;

  //if an event was set
  if \(FQueueEvent\) then begin

    //retrieve the list of event
    FLibrary\.EventCounts\(aStatusVector,
                         aEventBufferLen,
                         aEventBuffer,
                         fResultBuffer\);

    //if it was the event
    for aCurrentEventIdx := 0 to 14 do
      if aStatusVector\[aCurrentEventIdx\] <\> 0 then onEvent\(fEventNamesArr\[aCurrentEventIdx\],aStatusVector\[aCurrentEventIdx\]\);

    //reset the FQueueEvent
    FQueueEvent := False;

    //start to listen again
    ResetEvent\(Fsignal\);
    FLibrary\.EventQueue\(FdbHandle,
                        aEventID,
                        aEventBufferLen,
                        aEventBuffer,
                        @ALFBXEventCallback,
                        self\);

  end

  //it must be an error somewhere
  else aMustResetDBHandle := True;

Except
  on E: Exception do begin
    //Reset the DBHandle
    aMustResetDBHandle := True;
    OnException\(E\);
  end;
End;

end;

Try //free the local var InternalFreeLocalVar; Except on E: Exception do begin OnException(E); end; End;

//set completed to true //we need to to this because i don't know why //but on isapi the waitfor (call in thread.free) //never return. //but i don't remenbered if the free was call in the initialization //section of the ISAPI DLL (and that bad to do something like this //in initialization or finalization). fcompleted := True; end;

firebird-automations commented 13 years ago
Modified by: vander clock stephane (arkadia) Attachment: ALFBXEvent\.zip \[ 11840 \]
firebird-automations commented 13 years ago

Commented by: @dyemanov

Sounds similar to CORE3170.

firebird-automations commented 13 years ago

Commented by: @hvlad

No, it is different bug. I'm already testing patch and hope to commit it soon.

firebird-automations commented 13 years ago

Commented by: vander clock stephane (arkadia)

Vlad, i lost the email you send me about the result of the test on the new version you have done. actually it's ok, it's not raise the Exception BUT i do the test only on our beta server without a true activity on him. but as this bug was simple to reproduce (when we know the raison) i thing now is ok !

firebird-automations commented 13 years ago
Modified by: @hvlad status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Fixed \[ 1 \] Fix Version: 2\.1\.4 \[ 10361 \] Fix Version: 2\.5\.1 \[ 10333 \] Fix Version: 3\.0 Alpha 1 \[ 10331 \]
firebird-automations commented 13 years ago
Modified by: @dyemanov summary: 100% CPU USAGE with Unilimited Loop & Index corrupted =\> 100% CPU USAGE \(endless loop\) in the remote protocol code related to events processing
firebird-automations commented 13 years ago
Modified by: @pcisar status: Resolved \[ 5 \] =\> Closed \[ 6 \]
firebird-automations commented 13 years ago

Commented by: Ann Lynnworth (annfire)

I also had this problem, but I could recreate it within a few seconds. The symptom was that the client would hang with an ISC disconnect error message.

ISC ERROR CODE:335544721

ISC ERROR MESSAGE: Unable to complete network request to host "(snip)". Failed to establish a connection.

Meanwhile the server side would accumulate a giant log file (larger than 33 GB) with endless repetition of these two:

FB101 (Server) Mon May 30 01:25:05 2011 INET/select_wait: found "not a socket" socket : 536

FB101 (Server) Mon May 30 01:25:05 2011 INET/inet_error: accept errno = 10038

To give some context and extra keywords: I was testing IBObjects replication, which uses events. Activating the replication triggered the myriad problems (often including Firebird crashing).

As Firebird server v2.5.1 (which supposedly fixes this issue) is not available, a workaround may be of interest to other firebird admins. It is obvious in retrospect. (a) Edit firebird.conf and set a fixed port for events, e.g. 3051. Restart Firebird service. (b) Change the firewall rules to allow traffic on that port, limited by ip number etc as relevant. Once the firewall allows traffic on the fixed event port, replication works (yes, the app no longer hangs).

firebird-automations commented 13 years ago

Commented by: @hvlad

Ann,

> As Firebird server v2.5.1 (which supposedly fixes this issue) is not available

are you aware of daily snapshot builds ?

firebird-automations commented 8 years ago
Modified by: @pavel-zotov QA Status: No test