GMLC-TDC / pyhelics

CFFI Python interface for HELICS
https://python.helics.org
BSD 3-Clause "New" or "Revised" License
3 stars 8 forks source link

Race conditions in broker connections #27

Closed blthayer closed 1 year ago

blthayer commented 3 years ago

I don't 100% understand the cause of this issue, but I do have a solution. So, rather than trying to verbosely describe what's going on, I'm going to let the code do the talking.

Platform/Versions

OS

OS: Windows 10 Pro Version: 20H2 Build: 19042.1081 Experience: Windows Feature Experience Pack 120.2212.3530.0

Software

Reproducing

After setting up a virtual environment (outside the scope of this ticket), put the following code in test_bug.py:

import helics as h
import pytest

class ComboFed:
    """Simple wrapper around a HelicsCombinationFederate."""

    def __init__(self, timeout, core_type, broker_name):
        fed_info = h.helicsCreateFederateInfo()

        # Configure and create the HELICS federate.
        # Basic properties.
        fed_info.core_name = 'test_fed'
        fed_info.core_init = (
            f'--federates=1 '
            f'--network_timeout={timeout}s'
        )
        fed_info.core_type = core_type
        fed_info.broker = broker_name

        # Hard-code the period (no "Pythonic" property available yet)
        h.helicsFederateInfoSetTimeProperty(
            fi=fed_info,
            time_property=h.helics_property_time_period,
            value=60.0
        )

        # Create the federate.
        self.federate = h.helicsCreateCombinationFederate(
            fed_name='test_fed', fi=fed_info
        )

def test_cannot_create_combo_fed_without_broker():
    # Note that this test is both listed first and comes alphabetically
    # before "test_create_combo_fed_with_broker" so should be executed
    # first.
    with pytest.raises(h.HelicsException, match='Unable to connect to broker'):
        ComboFed(timeout=0.5, core_type='inproc', broker_name='test_broker')

def test_create_combo_fed_with_broker():
    broker = h.helicsCreateBroker(
        type='inproc',
        name="test_broker",
        init_string=f"--federates 1"
    )
    fed = ComboFed(timeout=0.5, core_type='inproc', broker_name='test_broker')
    assert isinstance(fed.federate, h.HelicsCombinationFederate)

Now, run the tests (in your activated virtual environment): pytest test_bug.py. Expected output:

collected 2 items

test_cleanup_bug.py .F                                                                                                                                                                                                                [100%]

================================================================================================================= FAILURES =================================================================================================================
____________________________________________________________________________________________________ test_create_combo_fed_with_broker _____________________________________________________________________________________________________

    def test_create_combo_fed_with_broker():
        broker = h.helicsCreateBroker(
            type='inproc',
            name="test_broker",
            init_string=f"--federates 1"
        )
>       fed = ComboFed(timeout=0.5, core_type='inproc', broker_name='test_broker')

test_cleanup_bug.py:48:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
test_cleanup_bug.py:29: in __init__
    self.federate = h.helicsCreateCombinationFederate(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

fed_name = 'test_fed', fi = <helics.HelicsFederateInfo() at 0x21290a48730>

    def helicsCreateCombinationFederate(fed_name: str, fi: HelicsFederateInfo = None) -> HelicsCombinationFederate:
        """
        Create a combination federate from `helics.HelicsFederateInfo`.
        Combination federates are both value federates and message federates, objects can be used in all functions
        that take a `helics.HelicsFederate` object as an argument.

        **Parameters**

        - **`fed_name`** - A string with the name of the federate, can be NULL or an empty string to pull the default name from fi.
        - **`fi`** - The federate info object that contains details on the federate.

        **Returns**: `helics.HelicsCombinationFederate`.
        """
        f = loadSym("helicsCreateCombinationFederate")
        err = helicsErrorInitialize()
        if fi is None:
            fi = helicsCreateFederateInfo()
        result = f(cstring(fed_name), fi.handle, err)
        if err.error_code != 0:
>           raise HelicsException("[" + str(err.error_code) + "] " + ffi.string(err.message).decode())
E           helics.capi.HelicsException: [-1] Unable to connect to broker->unable to register federate

..\venv\lib\site-packages\helics\capi.py:3267: HelicsException
========================================================================================================= short test summary info ==========================================================================================================
FAILED test_cleanup_bug.py::test_create_combo_fed_with_broker - helics.capi.HelicsException: [-1] Unable to connect to broker->unable to register federate
======================================================================================================= 1 failed, 1 passed in 1.00s ========================================================================================================

Notice that test_create_combo_fed_with_broker failed unexpectedly.

Next, comment out the entirety of test_cannot_create_combo_fed_without_broker and execute the tests (well, just one test this time) again. Expected output:

collected 1 item

test_cleanup_bug.py .                                                                                                                                                                                                                 [100%]

============================================================================================================ 1 passed in 0.34s =============================================================================================================

These two tests are completely isolated, yet test_cannot_create_combo_fed_without_broker is interfering with test_create_combo_fed_with_broker.

Next, uncomment test_cannot_create_combo_fed_without_broker to restore the test module to its original state. Now, add the following fixture before test_cannot_create_combo_fed_without_broker:

@pytest.fixture(scope='class', autouse=True)
def clean_helics():
    h.helicsCleanupLibrary()

The autouse=True flag means this fixture will be run before every test.

Expected output after re-running the tests:

collected 2 items

test_cleanup_bug.py ..                                                                                                                                                                                                                [100%]

============================================================================================================ 2 passed in 1.01s =============================================================================================================

Note that I cannot provide any guarantee that helicsCleanupLibrary is actually solving the problem by whatever cleanup actions it is taking. If you add an import time line to the top of the module and replace h.helicsCleanupLibrary() with time.sleep(1) the tests also pass.

kdheepak commented 3 years ago

cc @phlptp

phlptp commented 3 years ago

One of the things cleanup library is doing is essentially garbage collecting the threads that were started up. This can take some time and depends on the OS. Since the threads need to handle the shutdown of the sockets/files and make sure the connections are disconnected safely. In many cases sleeping in effect does the same thing. Many of the tests in C++ call the cleanup after each test to make sure there is no stray threads running around to impact the tests.

blthayer commented 3 years ago

@phlptp: Thanks for the reply!

In https://github.com/GMLC-TDC/pyhelics/pull/26, the Python classes were updated so that upon garbage collection of an instance, helicsCleanupLibrary is called. Does this seem appropriate to you?

@kdheepak: I'll be interested to know if this is a "non-issue" after the update in #26.

trevorhardy commented 1 year ago

Closing out as HELICS v2 is deprecated (and this issue may have been specifically addressed in #26).