ansible / ansible-runner

A tool and python library that helps when interfacing with Ansible directly or as part of another system whether that be through a container image interface, as a standalone tool, or as a Python module that can be imported. The goal is to provide a stable and consistent interface abstraction to Ansible.
Other
968 stars 356 forks source link

Possible race condition with passing in inventory as object #1358

Closed HiDoYa closed 6 months ago

HiDoYa commented 6 months ago

When running a bunch of ansible-runners in parallel in many subprocesses, once in a while (2 out of 10 times I've tested) I get this exception from several of the ansible-runners:

No such file or directory: '/tmp/project/inventory/.artifact_write_lock'")
Traceback (most recent call last):
... Omitted ...
  File "/src/workflow_manager.py", line 295, in _create_runner
    runner_obj = ansible_runner.interface.run(
  File "/usr/local/lib/python3.9/site-packages/ansible_runner/interface.py", line 212, in run
    r = init_runner(**kwargs)
  File "/usr/local/lib/python3.9/site-packages/ansible_runner/interface.py", line 69, in init_runner
    dump_artifacts(kwargs)
  File "/usr/local/lib/python3.9/site-packages/ansible_runner/utils/__init__.py", line 234, in dump_artifacts
    kwargs['inventory'] = dump_artifact(json.dumps(obj), path, 'hosts.json')
  File "/usr/local/lib/python3.9/site-packages/ansible_runner/utils/__init__.py", line 167, in dump_artifact
    os.remove(lock_fp)
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/project/inventory/.artifact_write_lock'

I think there may be a race in this code under src/ansible_runner/utils/init.py to delete the lock file after unlocking which is used when ansible-runner is dumping the inventory object to file.

    if not os.path.exists(fn) or p_sha1.hexdigest() != c_sha1.hexdigest():
        lock_fp = os.path.join(path, '.artifact_write_lock')
        lock_fd = os.open(lock_fp, os.O_RDWR | os.O_CREAT, stat.S_IRUSR | stat.S_IWUSR)
        fcntl.lockf(lock_fd, fcntl.LOCK_EX)

        try:
            with open(fn, 'w') as f:
                os.chmod(fn, stat.S_IRUSR | stat.S_IWUSR)
                f.write(str(obj))
        finally:
            fcntl.lockf(lock_fd, fcntl.LOCK_UN)
            os.close(lock_fd)
            os.remove(lock_fp)
Shrews commented 6 months ago

Despite the fact a lock file is in use here (likely an artifact from much older code), ansible-runner is not designed to be used in parallel against the same private_data_dir, which I suspect you are doing but you have not given enough information to say that for certain.

HiDoYa commented 6 months ago

I see, thanks. We do have parallel ansible-runner's against the same private_data_dir.