FirebirdSQL / firebird

Firebird server, client and tools
https://www.firebirdsql.org/
1.19k stars 205 forks source link

Possible deadlock in firebird connect. [CORE4680] #4989

Closed firebird-automations closed 6 years ago

firebird-automations commented 9 years ago

Submitted by: Sascha Michel (datiscum1)

Is related to CORE5015 Is related to CORE5014 Is related to CORE5017

Attachments: screenshot-1.jpg c4680-test-batch.zip

Approximately every month i see a "firebird superserver" that stalled. I must restart the server with the init script , every connect to the server is impossible. Firebird offers the following message in the log file:

Koenig_DB_server (Server) Tue Oct 21 10:47:50 2014 Shutting down the server with 26 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Thu Nov 13 07:50:54 2014 Shutting down the server with 16 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Thu Jan 15 09:42:31 2015 Shutting down the server with 30 active connection(s) to 1 database(s), 0 active service(s)

Koenig_DB_server (Server) Fri Jan 30 10:24:50 2015 Shutting down the server with 27 active connection(s) to 1 database(s), 0 active service(s)

Since it was years long so goes, I have written a little program, with which I was able to replicate the problem.

1 .A connection is established to the database server 2. 9 events registered. 3. Waited a short time ( millisecond ) 4. Unregister the events. 5. Connection closed. 6. Start all over again.

If it is possible to kill the process that caused the deadlock, the server continues to run normal. If no events were registered it does not happen.

I was able to reproduce the problem under "super server firebird 2.5 / 3beta2 linux / windows".

Here's a video that shows that problem. https://datiscum.com/FB_Test_x264.mp4

The program can be downloaded here. https://datiscum.com/FirebirdTest.7z

The downloads are available only for a few days.

I hope this was helpful and the problem can be eliminated.

Regards, Sascha Michel

Commits: FirebirdSQL/firebird@a6c6d15daa4555cbfb53632e5b36c16653e6033f FirebirdSQL/firebird@f7e7c513a1b79cc0d59015676c8d8bf1191015cf FirebirdSQL/firebird@6a85b46ea701d112583aa7989bf92f8c722999df FirebirdSQL/firebird@a6d615c0cd6905a03318bb9b9d8a7625c08eb68a FirebirdSQL/firebird@ed1e5e65a5b4fbcb05489c0f5cb9744fd3d54051 FirebirdSQL/firebird@fc4063fa160e53a61c222fe3bad0aa34808ba592 FirebirdSQL/firebird@4068a225b45dbace1f28c2092642dd333ee6d116 FirebirdSQL/firebird@2ab15d1e5f34240f2b3d42517d49ae040ac246c0 FirebirdSQL/fbt-repository@f8758da9cba37741e8ae3e7915346f70fbd9d76d FirebirdSQL/fbt-repository@311169807bcca7f88d9545283b1e707e5073e3bb FirebirdSQL/fbt-repository@a70c1f384428d9defbc9bdfc5b773c2c7d8691a7 FirebirdSQL/fbt-repository@784b47d1308c264933bfc984c723619ba54437a5 FirebirdSQL/fbt-repository@7ee5da0530c82150b0a833d793e161b3bac69428 FirebirdSQL/fbt-repository@bfbf5a82384293e92871861ce5aa140e5a5c8383

====== Test Details ======

Decided skip implementation after letter from hvlad, 28.12.2017 12:33.

firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

In Firebird-3.0.0.31632-Beta2 SuperServer the firebird server crash with:

debian7 Wed Feb 11 14:33:15 2015 /opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

and then be automatically restarted from the guardian process.

debian7 Wed Feb 11 14:33:15 2015 /opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

Now you need not manually restart the server.

But can you not completely avoid the crash?

Thanks, Sascha Michel

firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

LI-T6.3.0.31719 Firebird 3.0 Beta 2

debian7 Fri Mar 20 11:57:33 2015 /opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:57:33 2015 /opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 11:58:10 2015 /opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:58:10 2015 /opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 11:58:15 2015 /opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 11:58:15 2015 /opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

debian7 Fri Mar 20 12:01:07 2015 /opt/fb30/bin/fbguard: /opt/fb30/bin/firebird terminated abnormally (-1)

debian7 Fri Mar 20 12:01:07 2015 /opt/fb30/bin/fbguard: guardian starting /opt/fb30/bin/firebird

After one of these crashes, the database file was corrupt.

firebird-automations commented 9 years ago
Modified by: Sascha Michel (datiscum1) priority: Major \[ 3 \] =\> Critical \[ 2 \]
firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

More precisely the error bounded !!!!

For testing only this small database is required.

SET SQL DIALECT 3; SET NAMES UTF8; CREATE DATABASE '192.168.1.2:/shares/DB/EventTest.fdb' USER 'SYSDBA' PASSWORD 'XXXXXXX' PAGE_SIZE 16384 DEFAULT CHARACTER SET UTF8 COLLATION UTF8;

CREATE TABLE "EventSync" ( "Action" INTEGER NOT NULL, PID INTEGER NOT NULL ); ALTER TABLE "EventSync" ADD CONSTRAINT "PK_EventSync" PRIMARY KEY ("Action");

INSERT INTO EVENTSYNC ("Action", PID) VALUES (1, -1); INSERT INTO EVENTSYNC ("Action", PID) VALUES (2, -1);

COMMIT WORK;

The processes are now synchronized with the table "Event Sync" This has some very interesting results. Whenever Firebird hangs the programm is unregistering the events "SIBfibEventAlerter1-> Registered = false;" Connect / Disconnect always works when no events are used.

So finally i will say that the unregistering events ist the problem !!!! Problem exists in FB2.5 / FB3.

Viode demonstration: https://datiscum.com/FirebirdEventProblem.wmv Test programm: https://datiscum.com/FB_EventTest.exe

I hope this is helpful, more I cannot do.

Detail of the test program with FIBPlus: Main Loop: try { do {

  if \( \!IBCConnection\-\>Connected \)
  \{
    L\_ConnectStatus\-\>Caption = "Before connect";
    LLockStatus\-\>Caption = "Lock \-\-\-";
    Update\(\);
    IBCConnection\-\>Connected = true;
    L\_ConnectStatus\-\>Caption = "After connect";
    Update\(\);
    if \( ChBox\_RegEvents\-\>Checked \)
    \{
      if \( \!SIBfibEventAlerter1\-\>Registered \)
      \{
        LLockStatus\-\>Caption = "Lock wait";
        Update\(\);

LockR: try { if ( !pFIBTransaction1->Active ) pFIBTransaction1->Active = true;

          if \( \!pFIBDataSet\_EventSync\-\>Active \)
            pFIBDataSet\_EventSync\-\>Open\(\);
          pFIBDataSet\_EventSync\-\>Edit\(\);
          pFIBDataSet\_EventSync\-\>FieldByName\("PID"\)\-\>AsInteger = Application\-\>Tag ;
          pFIBDataSet\_EventSync\-\>FieldByName\("Action"\)\-\>AsInteger = 1;
          pFIBDataSet\_EventSync\-\>Post\(\);
          LLockStatus\-\>Caption = "Lock set";
          Update\(\);
        \}
        catch\(\.\.\.\)
        \{
          goto LockR;
        \}
        L\_EventStatus\-\>Caption = "Before Register events";
        Update\(\);
        SIBfibEventAlerter1\-\>Registered = true;
        L\_EventStatus\-\>Caption = "After Register events";
        LLockStatus\-\>Caption = "Before lock release";
        Update\(\);
        if \( pFIBTransaction1\-\>Active \)
          pFIBTransaction1\-\>Rollback\(\);
        LLockStatus\-\>Caption = "Lock released";

      \}
    \}
  \}

  if \(IBCConnection\-\>Connected \)
  \{
    Sleep\(2\);
    L\_ConnectStatus\-\>Caption = "Before disconnect";
    LLockStatus\-\>Caption = "Lock \-\-\-";
    Update\(\);
    if \( SIBfibEventAlerter1\-\>Registered \)
    \{
      LLockStatus\-\>Caption = "Lock wait";

LockU: try { if ( !pFIBTransaction1->Active ) pFIBTransaction1->Active = true; if ( !pFIBDataSet_EventSync->Active ) pFIBDataSet_EventSync->Open();

        pFIBDataSet\_EventSync\-\>Edit\(\);
        pFIBDataSet\_EventSync\-\>FieldByName\("PID"\)\-\>AsInteger = Application\-\>Tag ;
        pFIBDataSet\_EventSync\-\>FieldByName\("Action"\)\-\>AsInteger = 1;
        pFIBDataSet\_EventSync\-\>Post\(\);
        LLockStatus\-\>Caption = "Lock set";
      \}
      catch\(\.\.\.\)
      \{
        if \( pFIBTransaction1\-\>Active \)
          pFIBTransaction1\-\>Rollback\(\);
        goto LockU;
      \}
      L\_EventStatus\-\>Caption = "Before unregister events";
      Update\(\);
      SIBfibEventAlerter1\-\>Registered = false;
      L\_EventStatus\-\>Caption = "After unregister events";
      Update\(\);

/* LLockStatus->Caption = "Before lock release"; Update(); if ( pFIBTransaction1->Active ) pFIBTransaction1->Rollback(); LLockStatus->Caption = "After lock release"; Update(); */ } IBCConnection->Connected = false; L_ConnectStatus->Caption = "After disconnect"; Update(); } Application->ProcessMessages(); }while( AdvGlBu_Start->Down ); } catch(...) { AdvGlBu_Start->Down = false; }

firebird-automations commented 9 years ago

Commented by: @AlexPeshkoff

Trying to run your test I get:

err:module:import_dll Library BTHPROPS.DLL (which is needed by L"Z:\\usr\\home\\firebird\\tests\\4680\\FB_EventTest.exe") not found err:module:import_dll Library BLUETOOTHAPIS.DLL (which is needed by L"Z:\\usr\\home\\firebird\\tests\\4680\\FB_EventTest.exe") not found

I see no reason for bluetooth libraries to be needed to check for FB problems.

firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

Please download it again and it will work under linux wine. Please don't forget to copy a fbclient.dll in path or in wine system.

The other version will work under XP without the dll but wine has problems.

I have tested with wine 1.4.1 under Debian 7.

firebird-automations commented 9 years ago

Commented by: @AlexPeshkoff

Ok, now it works for me. How many test iterations attempts should pass to have it reproduced?

firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

I open the program 10 times and after 4.000 iterations the problem should occur.

firebird-automations commented 9 years ago

Commented by: @AlexPeshkoff

Ten clients ran 4000 iterations each - and server did not hang. I suppose something is wrong with your environment.

firebird-automations commented 9 years ago
Modified by: @AlexPeshkoff status: Open \[ 1 \] =\> Resolved \[ 5 \] resolution: Cannot Reproduce \[ 5 \]
firebird-automations commented 9 years ago

Commented by: Sascha Michel (datiscum1)

I gave you the test version with a workaround for the problem. Sorry please test it with the current version.

As a workaround, I'd start a new thread before unregistering the events and this closes the application and the error does not occur.

Please download it again and test it under an native windows version and not wine.

firebird-automations commented 9 years ago
Modified by: @pcisar status: Resolved \[ 5 \] =\> Closed \[ 6 \]
firebird-automations commented 8 years ago

Commented by: Sean Leyne (seanleyne)

Alex,

Was this case re-tested with Sascha's correct script version?

firebird-automations commented 8 years ago

Commented by: @AlexPeshkoff

Sascha, trying to download the test I get:

********** Secure Connection Failed

An error occurred during a connection to http://datiscum.com. Peer's Certificate has been revoked. (Error code: sec_error_revoked_certificate)

 The page you are trying to view cannot be shown because the 

authenticity of the received data could not be verified. Please contact the website owners to inform them of this problem. Alternatively, use the command found in the help menu to report this broken site. **********

Is 'not wine' requirement hard? I have no ability to run it under windows. And I can't catch how can client OS affect server behaviour so much.

firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

I have set up the new certificate.

I use Firebird in many small companies ( 5 User ) for more than 15 years. And also have not had a problem.

The more users ( 40 users ) are logged on at the same time, the higher the probability is that it occurs.

At my customer, I have discovered the problem by accident. I have killed a client and the Firebird server is again continued to run normally.

So I came up with the idea to write a small test tool. And it worked - the same problem in my office.

But I'm not saying that it is easy to provoke the http://state.My patch that kills the program automatically if it hangs during closing - works.

I hope that the issue can be resolved.

firebird-automations commented 8 years ago

Commented by: @hvlad

I can confirm rare AV's in fbclient but i see no hung of server

firebird-automations commented 8 years ago
Modified by: @hvlad status: Closed \[ 6 \] =\> Reopened \[ 4 \] resolution: Cannot Reproduce \[ 5 \] =\>
firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

How many clients you have started simultaneously?

What Firebird version you tested?

firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

The Ticket CORE4952 SuperServer could hung on database shutdown !? Resolved !

This is the same problem ! And it was marked as fixed.

Thanks ;-)

firebird-automations commented 8 years ago

Commented by: @hvlad

Sascha,

i reproduced two issues in fbclient using your test: - AV - deadlock

both issues related to the race condition in events handling code. We with Alex work on it and hope fix it soon.

firebird-automations commented 8 years ago

Commented by: @hvlad

> How many clients you have started simultaneously? Your FB_EventTest.exe

> What Firebird version you tested? Current FB3 on Windows

firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

I hope the problem can be fixed even in 2.5.5, before it is released.

firebird-automations commented 8 years ago

Commented by: @hvlad

Sascha,

CORE4952 is FB3 specific and not related with this ticket.

Finally, i reproduced server hung up (and few other issues), thanks to your test application (4 instances in my case). Set of fixes is committed into trunk (v3), please test it. Some of them will be backported into v2.5.5 a bit later.

firebird-automations commented 8 years ago

Commented by: Sean Leyne (seanleyne)

Vlad,

Please post some notes describing when the deadlocks could occur, to provide context for readers of this case.

firebird-automations commented 8 years ago

Commented by: @hvlad

Sean,

the issue affects SuperServer and SuperClassic (i.e. not ClassicServer).

It could happen when application used events and connects\disconnects so often that OS re-assigns value of just recently closed socket to the newly accepted one between two calls of select() in Firebird listener thread.

firebird-automations commented 8 years ago

Commented by: @hvlad

Patch is backported into v2.5.5

firebird-automations commented 8 years ago
Modified by: @hvlad status: Reopened \[ 4 \] =\> Resolved \[ 5 \] resolution: Fixed \[ 1 \] Fix Version: 3\.0 RC 1 \[ 10584 \] Fix Version: 2\.5\.5 \[ 10670 \]
firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

While it has been changed a bit, but the problem still exists.

I have now been tested with the version "LI V6.3.0.32136 Firebird 3.0 Release Candidate 1".

I've tested on Windows and Linux (Wine) The problem occurs even faster than before. A newer version of the testprogramm is here: https://datiscum.com/EventTest.zip

To understand the problem, the program must be started several times. Otherwise, there is no problem if it is only started once. 6 simultaneous program runs should be enough.

The server now no longer hangs, but shuts down itself.

Here is the log: FB30 Wed Nov 11 11:18:50 2015 INET/inet_error: invalid socket in packet_receive errno = 22

FB30 Wed Nov 11 11:18:50 2015 SRVR_multi_thread: shutting down due to unhandled exception

FB30 Wed Nov 11 11:18:50 2015 Unable to complete network request to host "FB30". Error reading data from the connection. Invalid argument

FB30 Wed Nov 11 11:18:50 2015 SRVR_multi_thread: forcefully disconnecting a port

FB30 Wed Nov 11 11:18:50 2015 Shutting down the server with 4 active connection(s) to 1 database(s), 0 active service(s)

FB30 Wed Nov 11 11:18:50 2015 /opt/firebird/bin/fbguard: /opt/firebird/bin/firebird normal shutdown.

firebird-automations commented 8 years ago
Modified by: Sascha Michel (datiscum1) Attachment: screenshot\-1\.jpg \[ 12850 \]
firebird-automations commented 8 years ago

Commented by: @hvlad

I can not reproduce it on Windows with 6 instances - it run few times for more than 5000 circles with no problem. I tried SS, 32 and 64 bit, RC1 and current snapshot.

PS Could you make your app to remember server\dbname entered by user ?

firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

I uploaded a new version of FB_EventText.zip. It accepts commandline switched ( -u , -p , -d ). I hope that's OK.

I test again with a fresh local install of Firebird under windows. With an empty database there were no errors.

Then i create two database triggers: SET TERM ^^ ;

CREATE TRIGGER "Con1" ACTIVE ON CONNECT POSITION 0 AS begin post_event 'Test'; end ^^

CREATE TRIGGER "Discon1" ACTIVE ON DISCONNECT POSITION 0 AS begin post_event 'Test'; end ^^

SET TERM ; ^^

After that , the error i am described is back. When i disable the triggers the error is gone.

I hope that it is reproducible.

Thanks

e.g. It looks as if the Disconnect trigger causing the problem alone. But in the real DB i use a "commit trigger".

firebird-automations commented 8 years ago
Modified by: @hvlad Link: This issue is related to [CORE5015](https://github.com/FirebirdSQL/firebird/issues?q=CORE5015+in%3Atitle) \[ [CORE5015](https://github.com/FirebirdSQL/firebird/issues?q=CORE5015+in%3Atitle) \]
firebird-automations commented 8 years ago
Modified by: @hvlad Link: This issue is related to [CORE5014](https://github.com/FirebirdSQL/firebird/issues?q=CORE5014+in%3Atitle) \[ [CORE5014](https://github.com/FirebirdSQL/firebird/issues?q=CORE5014+in%3Atitle) \]
firebird-automations commented 8 years ago

Commented by: @hvlad

> I uploaded a new version of FB_EventText.zip. > It accepts commandline switched ( -u , -p , -d ). I hope that's OK.

Thanks, it really helps.

The issue with > INET/inet_error: invalid socket in packet_receive errno = 22 is fixed now, see CORE5014

The issue with ON DISCONNECT trigger and POST_EVENT is fixed too, see CORE5015

firebird-automations commented 8 years ago
Modified by: @AlexPeshkoff Link: This issue is related to [CORE5017](https://github.com/FirebirdSQL/firebird/issues?q=CORE5017+in%3Atitle) \[ [CORE5017](https://github.com/FirebirdSQL/firebird/issues?q=CORE5017+in%3Atitle) \]
firebird-automations commented 8 years ago

Commented by: Sascha Michel (datiscum1)

Hello,

i tested again with 3.0.0.32179 and after 160.000 Connect/disconnets i got no server hang.

I got one server crash: FB30 Tue Nov 17 12:18:35 2015 /opt/firebird/bin/fbguard: /opt/firebird/bin/firebird terminated abnormally (-1) dmesg: firebird[31436]: segfault at 7f2100000000 ip 00007f218536e3ee sp 00007f218192d6e0 error 6 in http://libEngine12.so[7f2184f35000+62f000]

Then I tested it with Firebird 2.5.5.26952-0.amd64, but the server hangs after \~20.000 connects / disconnects.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

Some problem with trigger ON DISCONNECT event still does exist. Consider batch that creates database with trigger that fires on disconnect event and 'dummy' SQL script like this:

set bail on; set heading off; connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb' user 'SYSDBA' password 'masterke'; select 'iter #⁠1 of 10000', current_timestamp, current_connection from rdb$database; commit; connect 'localhost/3333:C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb' user 'SYSDBA' password 'masterke'; select 'iter #⁠2 of 10000', current_timestamp, current_connection from rdb$database; commit; . . . . . . (total number of iterations is defined by input argument of batch) . . .

This batch can be following:

@echo off setlocal enabledelayedexpansion enableextensions

set fbc=C:\MIX\firebird\fb30 set usr=SYSDBA set pwd=masterke set host=localhost set port=3333 set dbnm=C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb

del C:\MIX\firebird\QA\fbt-repo\tmp\e30.fdb 2>nul

set pbak=%path% set path=%fbc%;%pbak% set logp=logs md %logp% 2>nul

del %logp%\%\~n0.*.log 2>nul del %logp%\%\~n0.*.err 2>nul

set sql=%\~n0-run.tmp del %sql% 2>nul

set winq=20 if not .%1.==.. set winq=%1

set itercnt=10000 if not .%2.==.. set itercnt=%2

@echo on echo create database '%host%/%port%:%dbnm%' user '%usr%' password '%pwd%'; show database; | %fbc%\isql -q %fbc%\isql %host%/%port%:%dbnm% -user %usr% -password %pwd% -q -nod -i c4680-ddl.sql @echo off

@rem %fbc%\isql %host%/%port%:%dbnm% -user %usr% -password %pwd% -nod -q -x

( echo -- Generated auto, do NOT edit. echo set bail on; echo set heading off; for /l %%i in (1,1,%itercnt%) do ( echo connect '%host%/%port%:%dbnm%' user '%usr%' password '%pwd%'; select 'iter #⁠%%i of %itercnt%', current_timestamp, current_connection from rdb$database; commit; ) ) >> %sql%

for /l %%i in (1,1,%winq%) do ( set runcmd=%fbc%\isql -q -i %sql% 1^>%logp%\%\~n0.%%i.log 2^>%logp%\%\~n0.%%i.err @echo Starting ISQL #⁠%%i: !runcmd! start /min cmd /c !runcmd! )

Correct this batch appropriately to your environment (change values of fbc/host/port/usr/pwd and dbnm variables) and run it with TWO arguments: arg #⁠1 = number of ISQL sessions that shoudl be launched at the same time and make concurrent connect/disconnect statements; arg#⁠2 = number of iterations that should contain 'dummy' SQL script that will be executed by this batch.

I use arg#⁠1 = 30 and did not specified arg#⁠2 (i.e. let it be default value = 10'000).

Note: every ISQL session will stop immediately its execution on any error due to 'set bail on' statement that is .added at the beginning of 'dummy' script.

Now what I've get.

=== I. Legacy ===

When LEGACY plugin is used for authentification, ISQL sessions will "self-stop" very quickly in case of using SS: after about 130 iterations. And this number (\~130) will be almost the same (+/- 2..3) for ALL these sessions.

Error will be the same:

Statement failed, SQLSTATE = 08006 Unable to complete network request to host "localhost". -Failed to establish a connection. After line 131 in file c4680-run-run.tmp

As of Classic, error will raise also quick but can differ:

Statement failed, SQLSTATE = 08004 sh_mem_length_mapped is 0 -IProvider::attachDatabase failed when loading mapping cache After line 38 in file c4680-run-run.tmp

or this:

Statement failed, SQLSTATE = HY000 sh_mem_length_mapped is 0 After line 35 in file c4680-run-run.tmp

(i.e. without line "-IProvider::attachDatabase failed when loading mapping cache").

It's interestingly (in Classic) that ONE isql session (of 30 launched) did not get exception and continued its work, but no longer any concurrent attachments.

As of SuperClassic, I could not get exception.

=== II. Srp. ===

When Srp plugin is used for authentification, ISQL sessions can work MUCH LONGER. On SS I got this only after \~1970 iteration (of 10'000 total). For CS and SC this test not yet launched.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

PS. WI-V3.0.0.32179, OS = Win XP.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

Sorry, forgot to show script 'c4680-ddl.sql' that is executed after database creation. It's trivial:

set term ^;
create or alter trigger trg\_disc active on disconnect position 0 as
begin
    post\_event 'bar';
end 
^
set term ;^
commit;

show trigger trg\_disc;

===

firebird-automations commented 8 years ago

Commented by: @AlexPeshkoff

Sascha!

When you post this info:

I got one server crash: FB30 Tue Nov 17 12:18:35 2015 /opt/firebird/bin/fbguard: /opt/firebird/bin/firebird terminated abnormally (-1) dmesg: firebird[31436]: segfault at 7f2100000000 ip 00007f218536e3ee sp 00007f218192d6e0 error 6 in http://libEngine12.so[7f2184f35000+62f000]

it tells almost nothing to us about reasons of failure. Taking into an account that: - you can use win client + linux server - test run was about 160000 iterations the result you had is kind of unique, and it's very sad that we have no core dump of server for it. Can you take a look at http://www.ibphoenix.com/resources/documents/search/doc_36 and provide stack traces? In many cases they do have enough info for solving a problem.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

> For CS and SC this test not yet launched.

SuperClassic has *no* such problem. All attachments works as long as I could wait - more than 3 hours and did not encounter exception.

Classic *has* trouble, but it seems more stable than Super: exception can raise in some of attachments rather than all at once (as in SS). Log can be filled with following messages in Classic:

CSPROG Wed Nov 18 17:57:01 2015 MonitoringData: Cannot initialize the shared memory region sh_mem_length_mapped is 0

And also these messages can appear (but with very low probability):

CSPROG Wed Nov 18 17:23:08 2015 LockManager::LockManager() Database is probably already opened by another engine instance in another Windows session

CSPROG Wed Nov 18 17:26:40 2015 SERVER/process_packet: broken port, server exiting

CSPROG Wed Nov 18 17:29:36 2015 SERVER/process_packet: broken port, server exiting

Updated batch with some addition logging commands is in attached file.

PS. It's look strange but when database has fw = OFF (yes, mon$forced_writes = ZERO) then LESS exceptions occured comparing with fw = ON (mon$forced_writes = ONE).

firebird-automations commented 8 years ago
Modified by: @pavel-zotov Attachment: c4680\-test\-batch\.zip \[ 12851 \]
firebird-automations commented 8 years ago
Modified by: @pavel-zotov status: Resolved \[ 5 \] =\> Resolved \[ 5 \] QA Status: Deferred Test Details: Wait for reply on issues of 18\-nov\-2015 about faults during concurrent ISQL launches\.
firebird-automations commented 8 years ago

Commented by: @hvlad

Pavel,

while your test allows to find and fix some real issues, it have *nothing common* with ticket subject. I.e. it is absolutely not related with events.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

I've doubted about whether to create new ticket or no. But presence of db-level trigger (and only on DISconnect event) is mandatory for reproducing mentioned effect, so I've decided to put results here. If this was wrong, I can create new ticket and copy last messages there - tell me if this is needed.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

Vlad, I just read your comment in http://sourceforge.net/p/firebird/code/62593:

Could happens with shared cache (former SS) only.

What about these messages that was in Classic (see my posts above):

Statement failed, SQLSTATE = 08004 sh_mem_length_mapped is 0 -IProvider::attachDatabase failed when loading mapping cache . . . MonitoringData: Cannot initialize the shared memory region sh_mem_length_mapped is 0

- ?

firebird-automations commented 8 years ago

Commented by: @hvlad

Pavel,

> What about these messages that was in Classic (see my posts above):

So far i could speak about 3 different issues showed by your test:

- race condition when last attachment is releasing database object and new attachment arrives at unhappy moment this one i fixed recently

- another AV related with run-time stats handling in SS mode it will be fixed soon

- issue with shared memory initialization, CS mode is not reproduced so far

None of them have relation with events, believe me or not.

firebird-automations commented 8 years ago

Commented by: @pavel-zotov

> None of them have relation with events, believe me or not.

I'm not speaking about EVENTS as reason of AV (and did not think about them :-)). These issues relate to DISCONNECT trigger only, am I correct ?

firebird-automations commented 8 years ago

Commented by: @hvlad

That AV's have no relation with ON DISCONNECT TRIGGER.

firebird-automations commented 8 years ago

Commented by: @hvlad

Pavel,

issues with shared memory initialization in CS mode should be fixed now. Please, test tomorrows snapshot build.