FirebirdSQL / firebird-qa

Python tools used for Firebird QA
MIT License
3 stars 6 forks source link

weird hang when execute TWO tests one-by-one (both use trace API) #4

Closed AlexPeshkoff closed 1 year ago

AlexPeshkoff commented 1 year ago

Pavel, I've used to fix two bugs in firebird (https://github.com/FirebirdSQL/firebird/issues/7359 & https://github.com/FirebirdSQL/firebird/issues/7361) related with subj. Currently I can say for sure - used in trace-related tests method to stop service via fixed session ID is not reliable. Nobody can guarantee that after session with ID == N will take place N+1, it may reset to 1, and this is normal behavior, specially for CS.

From what Pavel Zotov says mention method is built into test suite. Can you confirm this? If yes, a way to make behavior stable should be found - for example check for correct session name in sessions list.

pavel-zotov commented 1 year ago

check for correct session name in sessions list.

We have to use following syntax to STOP trace session:

fbtracemgr -SE service_mgr -STOP -ID <NNN>

(i.e. only using its ID rather than name). One need to use cumbersome (and fragile) manipulations to obtain session ID via known trace session name:

$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas ********* -start -name trace_5441 -config /var/tmp/trc_5441.cfg
$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas ********* -start -name trace_5470 -config /var/tmp/trc_5470.cfg
$ /opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas masterkey -list

Session ID: 2
  name:  trace_5441
  user:  SYSDBA
  date:  2022-10-26 17:13:58
  flags: active, trace

Session ID: 3
  name:  trace_5470
  user:  SYSDBA
  date:  2022-10-26 17:14:14
  flags: active, trace

# get trace ID for name = 'trace_5470':
[ibase@glb-srv-ik-test firebird-qa]$ (/opt/fb40tmp/bin/fbtracemgr -se localhost:service_mgr -user sysdba -pas masterkey -list | grep -B1 "  name:  trace_5470" ) | grep "Session ID:  | awk '{print $3}'"
3

This way seems to me poor because it strongly depends on FORMAT of 'fbtracemgr -list' output. Why we can not stop trace just using its name, without preliminary search of its ID ?

PS. Yes, i remember that we have discussed this privately a few years ago; but may be it is good to repeat for all involved persons.

AlexPeshkoff commented 1 year ago

In script - yes, need to parse (though in 1 line using shell and Unix utilities). But in python program one gets session name in service reply in separate clumplet, i.e. finding it is trivial.

pcisar commented 1 year ago

The qa plugin does not use arbitrary or fixed session ID, it uses the session ID reported back by service when session is started. Well, the firebird-driver have to parse the text output to get it, but anyway. The qa plugin also checks that trace was properly stopped, as the service response to stop request is normally "Trace session ID N stopped". This check is actually the reason why this Firebird issue was discovered. So, your statement that "method to stop service via fixed session ID is not reliable" for qa suite AFAIK does not apply here.

AlexPeshkoff commented 1 year ago

Sorry - you are absolutely right, problem is badly written test.

pavel-zotov commented 1 year ago

I still have a question about why pytest HANGED after show final message about failed test (i.e. when bug was not yet fixed). Consider following test:

[ibase@glb-srv-ik-test bugs]$ cat -n /home/ibase/firebird-qa/tests/bugs/core_hang_test.py
     1  #coding:utf-8
     2
     3  import time
     4  import pytest
     5  from firebird.qa import *
     6
     7  db = db_factory()
     8  act = python_act('db')
     9
    10  trace_5441 = [
    11      'log_statement_finish = true',
    12  ]
    13  #-------------------------------------------
    14  trace_5470 = [
    15      'log_statement_finish = true',
    16      'include_filter = "%foobar%"',
    17  ]
    18      #'exclude_filter = "%table%"',
    19  #-------------------------------------------
    20  @pytest.mark.version('>=3.0')
    21  def test_1(act: Action):
    22      with act.trace(db_events = trace_5441) as ts:
    23          # def trace() -> TraceSession
    24          if ts.output:
    25              # see core.py (firebird-driver),
    26              # class ServerTraceServices(ServerServiceProvider),
    27              #   def start(self, *, config: str, name: str=None) -> int
    28              #       ...
    29              #       response = self._srv()._fetch_line()
    30              #       return int(response.split()[3])
    31              session = str(ts.output[0])
    32          else:
    33              session = '[no output]'
    34          with open('/var/tmp/trace_5441.session.tmp', 'w') as f:
    35              f.write(f'trace_5441: session = {session}\n')
    36          #pass
    37      assert True
    38
    39  #----------------------------------------------
    40  @pytest.mark.version('>=3.0')
    41  def test_2(act: Action):
    42      #time.sleep(0.5) # ----------------------------------- [ NB ]
    43      with act.trace(db_events = trace_5470) as ts:
    44          # def trace() -> TraceSession
    45          if ts.output:
    46              session = str(ts.output[0])
    47          else:
    48              session = '[no output]'
    49          with open('/var/tmp/trace_5470.session.tmp', 'w') as f:
    50              # if we take delay for at least 0.5 second then session will be 1
    51              # otherwise it will be 2 and pytest hangs infinitely at the final point.
    52              f.write(f'trace_5470: session = {session}\n')
    53          #pass
    54      assert True

Pay note at line #42: #time.sleep(0.5) # ----------------------------------- [ NB ]

If we comment it out then running this test on v4.0.3.2861, Classic, lead to following console output:

[ibase@glb-srv-ik-test firebird-qa]$ cd ~/firebird-qa ; /usr/local/bin/pytest --cache-clear --disable-warnings --basetemp=/var/tmp/fbqa -vv --tb=long --server fb401 ./tests
=========================================================================================== test session starts ===========================================================================================
platform linux -- Python 3.9.9, pytest-7.0.1, pluggy-1.0.0 -- /usr/local/bin/python3.9
cachedir: .pytest_cache
System:
  encodings: sys:utf-8 locale:UTF-8 filesystem:utf-8
Firebird:
  server: fb401 [v4.0.3.2861, Classic, Firebird/Linux/AMD/Intel/x64]
  home: /opt/fb40
  bin: /opt/fb40/bin
  client library: /opt/fb40/lib/libfbclient.so
rootdir: /home/ibase/firebird-qa, configfile: pytest.ini
plugins: firebird-qa-0.17.0, timestamper-0.0.9
collected 2 items

[2022-10-28 13:01:03] tests/bugs/core_hang_test.py::test_1 PASSED                                                                                                                                    [1/2]
[2022-10-28 13:01:06] tests/bugs/core_hang_test.py::test_2 FAILED                                                                                                                                    [2/2]

================================================================================================ FAILURES =================================================================================================
_________________________________________________________________________________________________ test_2 __________________________________________________________________________________________________

act = <firebird.qa.plugin.Action object at 0x7efe0f047cd0>

    @pytest.mark.version('>=3.0')
    def test_2(act: Action):
        #time.sleep(0.5) # ----------------------------------- [ NB ]
        with act.trace(db_events = trace_5470) as ts:
            # def trace() -> TraceSession
            if ts.output:
                session = str(ts.output[0])
            else:
                session = '[no output]'
            with open('/var/tmp/trace_5470.session.tmp', 'w') as f:
                # if we take delay for at least 0.5 second then session will be 1
                # otherwise it will be 2 and pytest hangs infinitely at the final point.
>               f.write(f'trace_5470: session = {session}\n')

tests/bugs/core_hang_test.py:52:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.qa.plugin.TraceSession object at 0x7efe0f047940>, exc_type = None, exc_value = None, traceback = None

    def __exit__(self, exc_type, exc_value, traceback) -> None:
        time.sleep(2)
        session = self.output.pop(0)
        with self.act.connect_server() as srv:
>           srv.trace.stop(session_id=session)

firebird/qa/plugin.py:1155:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.driver.core.ServerTraceServices object at 0x7efe1405fa00>

    def stop(self, *, session_id: int) -> str:
        """Stop trace session.

        Arguments:
            session_id: Trace session ID.

        Returns:
           Text message 'Trace session ID <x> stopped'.
        """
        #print('Intro trace stop(), session_id=',session_id)
>       return self.__action(ServerAction.TRACE_STOP, 'stopped', session_id)

../.local/lib/python3.9/site-packages/firebird/driver/core.py:5205:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <firebird.driver.core.ServerTraceServices object at 0x7efe1405fa00>, action = <ServerAction.TRACE_STOP: 23>, label = 'stopped', session_id = 2

    def __action(self, action: ServerAction, label: str, session_id: int) -> str:
        self._srv()._reset_output()
        with a.get_api().util.get_xpb_builder(XpbKind.SPB_START) as spb:
            spb.insert_tag(action)
            spb.insert_int(SrvTraceOption.ID, session_id)
            self._srv()._svc.start(spb.get_buffer())
        response = self._srv()._fetch_line()

        # 14.10.2022, temporary(?)
        #if action == ServerAction.TRACE_STOP:
        #    print('@@@ trace stopping: ', action.name, response)
        #else:
        #    print('@@@ other trace action: ', action.name)

        if not response.startswith(f"Trace session ID {session_id} {label}"):  # pragma: no cover
            # response should contain the error message
            #print('@@@ bad response for action ', action.name, ' : ', response)
>           raise DatabaseError(response)
E           firebird.driver.types.DatabaseError: Trace session ID 2 not found

../.local/lib/python3.9/site-packages/firebird/driver/core.py:5168: DatabaseError
------------------------------------------------------------------------------------------ Captured stdout setup ------------------------------------------------------------------------------------------
Creating db: localhost:/var/tmp/fbqa/test_20/test.fdb [page_size=None, sql_dialect=None, charset='NONE', user=SYSDBA, password=masterkey]
========================================================================================= short test summary info =========================================================================================
FAILED tests/bugs/core_hang_test.py::test_2 - firebird.driver.types.DatabaseError: Trace session ID 2 not found
================================================================================= 1 failed, 1 passed, 1 warning in 5.86s ==================================================================================

And pytest will infinitely HANG, w/o returning control to OS until we press Ctrl-C. But why it happens ?

Pressing Ctrl-C leads to following additional output:

Traceback (most recent call last):
  File "/home/ibase/firebird-qa/firebird/qa/plugin.py", line 1088, in trace_thread
    for line in srv:
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5301, in __next__
    if (line := self.readline()) is not None:
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5404, in readline
    self._read_output()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5350, in _read_output
    data = self._query_output(timeout)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5344, in _query_output
    self._svc.query(self._make_request(timeout), bytes([self.mode]), self.response.raw)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 1039, in query
    self._check()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 105, in _check
    raise self.__report(DatabaseError, self.status.get_errors())
firebird.driver.types.DatabaseError: Error reading data from the connection.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 973, in _bootstrap_inner
    self.run()
  File "/usr/local/lib/python3.9/threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "/home/ibase/firebird-qa/firebird/qa/plugin.py", line 1090, in trace_thread
    output.append(line)
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5295, in __exit__
    self.close()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/core.py", line 5447, in close
    self._svc.detach()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 1031, in detach
    self._check()
  File "/home/ibase/.local/lib/python3.9/site-packages/firebird/driver/interfaces.py", line 105, in _check
    raise self.__report(DatabaseError, self.status.get_errors())
firebird.driver.types.DatabaseError: connection shutdown
Exception ignored in: <module 'threading' from '/usr/local/lib/python3.9/threading.py'>
Traceback (most recent call last):
  File "/usr/local/lib/python3.9/threading.py", line 1470, in _shutdown
    lock.acquire()
KeyboardInterrupt:

And only after that control will returned to OS. Why Ctrl-C was needed to "finalize" execution ?

PS. firebird.conf:

RemoteServicePort = 34099
ServerMode = Classic
pcisar commented 1 year ago

Hi Pavel,

the explanation why pytest hangs is actually quite simple. The trace session runs in separate thread. If no error in service query happens, it ends when trace session is stopped. Because due to Firebird bug the session did not stop properly, the thread is not ended and Python "hangs" on exit (waiting for thread to finish).

I did not (deliberately) added a measure to qa plugin to stop trace thread explicitly (a signal/event to break the service query loop) as it's it complicates things significantly and it's pointless unless there is a BUG in Firebird.

best regards Pavel

AlexPeshkoff commented 1 year ago

On 10/29/22 11:54, Pavel Císař wrote:

Hi Pavel,

the explanation why pytest hangs is actually quite simple. The trace session runs in separate thread. If no error in service query happens, it ends when trace session is stopped. Because due to Firebird bug the session did not stop properly, the thread is not ended and Python "hangs" on exit (waiting for thread to finish).

I did not (deliberately) added a measure to qa plugin to stop trace thread explicitly (a signal/event to break the service query loop) as it's it complicates things significantly and it's pointless unless there is a BUG in Firebird.

On my mind test suite should be ready to do it's best when there is a bug - at least it's designed to detect that bugs ;) I do not insist on changes but do not think that adding:

pcisar commented 1 year ago

Pavel,

Dne 31. 10. 22 v 14:12 Alexander Peshkov napsal(a):

On my mind test suite should be ready to do it's best when there is a bug - at least it's designed to detect that bugs ;)

Well, it does, most times ;-)

I do not insist on changes but do not think that adding:

  • break on error return from service stop call,
  • timeouts when working with services and break when exceeded, significantly complicates the code.

Well, the main problem is that query to get next chunck (to EOF or single line) of trace data BLOCKs (i.e. the trace thread is actually waiting for service query call to finish, so it couldn't do anything else). The only way to get around it is using timeout for service query call, but current implementation of firebird-driver does not support service query with timeout, as I was not able to find a reliable way to support this that would fit the driver API (I'd like avoid backward incompatible changes), and that will also fit the architecture (support for Python iterator protocol etc.).

I would definitely like to add timeouts to firebird-driver sometime in future (as nice to have, at least for trace), but it's definitely not something that I could hack in single day. Also, as nobody so far (over all those years that I develop Python Firebird drivers) asked for service query timeout support (except lately for trace), it was never on top of my priority list.

regards Pavel