Rahix / tbot

Automation/Testing tool for Embedded Linux Development
https://tbot.tools
GNU General Public License v3.0
89 stars 21 forks source link

Order of requested roles should not matter #58

Closed NoUmlautsAllowed closed 3 years ago

NoUmlautsAllowed commented 3 years ago

Hi,

I am using tbot together with pytest. I recently experienced some strange behavior in tbot when it comes to requesting roles from the tbot.Context.

The tbot.Context is configured in conftest.py with the following settings:

# conftest.py

@pytest.fixture(scope="session", autouse=True)
def tbot_setup():
    tbot.log.LOGFILE = open("log/tbot_test.json", "w")
    tbot.log.VERBOSITY = 3
    tbot.log.NESTING += 1
    yield
    tbot.log.NESTING -= 1

@pytest.fixture(scope="session")
def tbot_context() -> Iterator[tbot.Context]:
    with tbot.Context(keep_alive=True, reset_on_error_by_default=True) as ctx:
        if os.environ.get('CI') == 'true':
            ctx.register(tbot.selectable.LocalLabHost, [tbot.role.LabHost])
        else:
            my_remote_lab.register_machines(ctx)

        my_board.register_machines(ctx)

        yield ctx

@pytest.fixture(scope="module", autouse=True)
def tbot_nesting():
    print("")
    tbot.log.NESTING += 1
    yield
    print("")
    tbot.log.NESTING -= 1

Since I am running the tests in a Gitlab Pipeline, I use the environment variable CI to determine, if the local host should be used as LabHost or if I need to connect to a remote lab. This is done in tbot_context().

When writing tests, I noticed different behavior if I was running the tests in the CI pipeline (i.e. using the local host as LabHost) and on my development machine using the SSHConnector to connect to the same machine used for the tests in the CI pipeline.

So I tracked down the problem with the following two tests where I request either the LabHost or the BoardLinux first:

# test_lh_first.py

import tbot

def test_lh_first(tbot_context: tbot.Context):
    with tbot_context.request(tbot.role.LabHost) as lh:
        lh.exec0("echo", "Hello World")

    with tbot_context.request(tbot.role.BoardLinux) as lnx:
        lnx.exec0("echo", "Hello World")
# test_lnx_first.py

import tbot

def test_lnx_first(tbot_context: tbot.Context):
    with tbot_context.request(tbot.role.BoardLinux) as lnx:
        lnx.exec0("echo", "Hello World")

    with tbot_context.request(tbot.role.LabHost) as lh:
        lh.exec0("echo", "Hello World")

I can run the tests with these commands directly on the lab host:

# pretend that we are in the CI environment to use the local host as LabHost
export CI=true
# run either test_lnx_first.py or test_lh_first.py
python3 -m pytest -v --capture=tee-sys --log-cli-level=INFO -v --color=yes test_lnx_first.py
python3 -m pytest -v --capture=tee-sys --log-cli-level=INFO -v --color=yes test_lh_first.py

The test_lnx_first.py works fine, with the test_lh_first.py I get the following error:

================================================================= short test summary info ==================================================================
ERROR test_lh_first.py::test_lh_first - Exception: trying to de-init a closed instance
=============================================================== 1 passed, 1 error in 39.52s ================================================================

When I run the tests from my development machine and use the SSHConnector to connect to the LabHost I can run the test_lh_first.py test without any errors.

So I wonder if the order in which roles are requested from the context matter when the teardown of the tests is executed and the roles are de-initialized.

My current workaround is to request the BoardLinux role first and the LabHost afterwards, but this means I have to wait until the board is powered up before I can use the LabHost which can be impractical in some cases where configuration needs to be done on the LabHost before the board is requested and powered up.

Am I missing something here so tbot should be used in a different way like I do?

Best regards NoU

Rahix commented 3 years ago

I can (somewhat) reproduce this issue, it is a problem with keep_alive mode it seems. The problem is that the order of machine instance teardown is not reflecting dependencies between instances. Essentially, we're just tearing down in the order of requests right now.

The proper solution is to find a way to perform teardown in an order where any dependencies are dealt with after their dependents. I'll have to experiment a bit to see which method is the most robust one for doing this. I will post an update here once I have a solution available.


BTW, instead of doing

with tbot.Context(keep_alive=True, reset_on_error_by_default=True) as ctx:
    if os.environ.get('CI') == 'true':
        ctx.register(tbot.selectable.LocalLabHost, [tbot.role.LabHost])
    else:
        my_remote_lab.register_machines(ctx)

you can use add_defaults which is probably more elegant and future proof:

with tbot.Context(keep_alive=True, reset_on_error_by_default=True, add_defaults=True) as ctx:
    if os.environ.get('CI') != 'true':
        my_remote_lab.register_machines(ctx)
NoUmlautsAllowed commented 3 years ago

I commented on the #59. The changes introduced with c488841c37d089df6a0455fe559b032a1e1e6daa fix the problem for me.


BTW, instead of doing

with tbot.Context(keep_alive=True, reset_on_error_by_default=True) as ctx:
    if os.environ.get('CI') == 'true':
        ctx.register(tbot.selectable.LocalLabHost, [tbot.role.LabHost])
    else:
        my_remote_lab.register_machines(ctx)

you can use add_defaults which is probably more elegant and future proof:

with tbot.Context(keep_alive=True, reset_on_error_by_default=True, add_defaults=True) as ctx:
    if os.environ.get('CI') != 'true':
        my_remote_lab.register_machines(ctx)

Thanks for the suggestion with the defaults :)