chime-experiment / comet

A Config and Metadata Tracker
0 stars 3 forks source link

Failure acquiring lock #34

Open jrs65 opened 4 years ago

jrs65 commented 4 years ago

Occasionally comet will give this error:

Oct 08 13:53:23 recv1 comet[18771]: Traceback (most recent call last):
Oct 08 13:53:23 recv1 comet[18771]: [2019-10-08 13:53:23,570] sanic.root: Exception occurred while handling uri:
Oct 08 13:53:23 recv1 comet[18771]: RuntimeError: Lock is not acquired.
Oct 08 13:53:23 recv1 comet[18771]:     raise RuntimeError('Lock is not acquired.')
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/asyncio/locks.py", line 218, in release
Oct 08 13:53:23 recv1 comet[18771]:     self.release()
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/asyncio/locks.py", line 98, in __aexit__
Oct 08 13:53:23 recv1 comet[18771]:     break
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/site-packages/comet/broker.py", line 439, in
Oct 08 13:53:23 recv1 comet[18771]:     found = await wait_for_dset(ds["base_dset"])
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/site-packages/comet/broker.py", line 374, in
Oct 08 13:53:23 recv1 comet[18771]:     dataset_valid = await checkDataset(ds)
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/site-packages/comet/broker.py", line 248, in
Oct 08 13:53:23 recv1 comet[18771]:     response = await response
Oct 08 13:53:23 recv1 comet[18771]:   File "/usr/local/lib/python3.7/site-packages/sanic/app.py", line 917, in ha
Oct 08 13:53:23 recv1 comet[18771]: Traceback (most recent call last):

This looks suspiciously similar to python/cpython#7216, however that should have been fixed in Python 3.7 which we are running. I can't easily reproduce this locally. I've attached an example script which is similar to the relevant code in coco which doesn't fail.

Also, things are a little weird with branches. The version running above in this failure seems to be d030e1c1634e

jrs65 commented 4 years ago
import logging
import asyncio

from asyncio import CancelledError, TimeoutError

loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)

logging.basicConfig(level=logging.DEBUG)

logger = logging.getLogger(__name__)

WAIT_TIME = 2

lock_datasets = asyncio.Lock()
signal_datasets_updated = asyncio.Condition(lock_datasets)

async def notify(delay):

    for i in range(4):
        await asyncio.sleep(delay)

        async with signal_datasets_updated:
            signal_datasets_updated.notify_all()

async def wait_for_dset(id):
    """Wait until the given dataset is present."""

    logger.debug("wait_for_ds: Waiting for dataset {}".format(id))

    async with lock_datasets:
        while True:
            try:
                await asyncio.wait_for(signal_datasets_updated.wait(), WAIT_TIME)
            except (TimeoutError, CancelledError):
                logger.warning(
                    "wait_for_ds: Timeout ({}s) when waiting for dataset {}".format(
                        WAIT_TIME, id
                    )
                )
                return False
            if id == 0:
                logger.debug("wait_for_ds: Found dataset {}".format(id))
                break
            logger.info("decreasing id")
            id -= 1

if __name__ == "__main__":
    import time
    s = time.perf_counter()
    loop.run_until_complete(asyncio.gather(wait_for_dset(3), notify(1)))
    elapsed = time.perf_counter() - s
    print(f"{__file__} executed in {elapsed:0.2f} seconds.")