dls-controls / aioca

Asynchronous Channel Access client for asyncio and Python using libca via ctypes
Apache License 2.0
6 stars 3 forks source link

Aioca disconnects when repeatedly run under pytest #31

Open AlexanderWells-diamond opened 1 year ago

AlexanderWells-diamond commented 1 year ago

When repeatedly running tests against a running IOC, aioca will occasionally report a timeout exception when trying to establish the connection, and subsequently fail the test.

The code below is a minimal recreate of the issue - it starts a softioc IOC before the first test, runs 1000 attempts to get a PV, and then stops the IOC. Between one and three test failures are reported on my machine (although occasionally it reports 0).

import multiprocessing
import time
import pytest

def simple_ioc():
    from softioc import softioc, builder, asyncio_dispatcher
    dispatcher = asyncio_dispatcher.AsyncioDispatcher()
    builder.SetDeviceName("ABC")
    builder.longIn("PV", initial_value=0)
    builder.LoadDatabase()
    softioc.iocInit(dispatcher)
    while(True):
        time.sleep(0.1)

@pytest.fixture(scope="module")
def ioc_inline():
    p = multiprocessing.get_context("forkserver").Process(target=simple_ioc)
    p.start()
    yield
    p.kill()
    p.join()

@pytest.mark.parametrize("abc", [i for i in range(1000)])
@pytest.mark.asyncio
async def test_get_pv(abc, ioc_inline):
    from aioca import caget, purge_channel_caches
    val = await caget("ABC:PV")
    assert val == 0

This was tested in a new pipenv environment with just aioca==1.5, softioc==4.2.0, pytest, and pytest-asyncio. Tests were executed with pipenv run pytest --tb=native -vv test.py.

The issue is exposed due to the use of event loops in pytest. By default a new loop is created for every test. The code below overrides this behaviour, and when using it shows zero errors (it also runs significantly faster):

@pytest.fixture(scope="session")
def event_loop():
    import asyncio
    try:
        loop = asyncio.get_running_loop()
    except RuntimeError:
        loop = asyncio.new_event_loop()
    yield loop
    loop.close()

Originally discovered by @rjwills28 while working on Coniql.

coretl commented 1 year ago

I can see the same thing locally. The first caget in a new event loop will clear out the old channel connections and recreate in a new event loop. We can do the same thing by doing a purge_channel_caches before each caget. If we then insert a short sleep before the caget the problem goes away:

@pytest.mark.parametrize("abc", [i for i in range(1000)])
@pytest.mark.asyncio
async def test_get_pv(abc, ioc_inline):
    from aioca import purge_channel_caches

    purge_channel_caches()
    await asyncio.sleep(0.05)
    val = await caget(LONGOUT)
    assert val == 42

My guess is that ca is producing some old updates on the old event loop rather than the new, but not sure how. Will try and create a minimal reproducer using just ca.

coretl commented 1 year ago

This appears to be an issue in CA. I've reported it upstream: https://github.com/mdavidsaver/epicscorelibs/issues/16