dagwieers commented 6 years ago

Proposal: flock

Author: Dag Wieers <@dagwieers> IRC: dag

Date: 2018-05-09

Status: New
Proposal type: core design
Targeted Release: <future release #, ex. 2.2>
Associated PR: <link to GH PR in ansible/proposals if PR was submitted>
Estimated time to implement: days

Motivation

Currently when modules are concurrently running on the same system (eg. delegated to localhost) modifying the same file can lead to corruption or data loss. Adding flock() support to file-access would avoid such problems (or report issues back to the user).

Problems

Avoid concurrent write-access to files that Ansible is managing using an exclusive (process-based) lock
This prevents file-corruption or data-loss

Simple example, when reprovisioning systems concurrently, you may want to remove existing keys from known_hosts on the Ansible host. Doing that currently with the known_hosts module will fail and there's no easy way to do this from a role as you can't use serial/forks on a task.

Second example, concurrently ensuring a specific file (eg. known_hosts) was removed would fail, because the file would be removed by one of the processes (in between testing existence and the actual removal) and the removal would fail. (This was fixed in the file module in PR ansible/ansible#39466, but would not have happened if flock was implemented).

Third example, in some cases functionality is not allowed to run concurrently, but is not prevented by the API (eg. Cobbler syncs). Currently there is no means to avoid concurrent calls, but a flock implementation could prevent a module from being run concurrently.

Solution proposal

A module needs to add a lock on the original file before accessing that file for reading/copying and editing.
Possibly using a timeout-value that relates to the number of forks in some way ?
The lock could be lifted once the file has been atomically moved into place (or will be gone when the process ends)

Dependencies (optional)

We may need to look at other file-modifying functionality offered to or used by modules that needs flock support added, but this is less of an issue if the module has proper locking before accessing files.
Our only concern is concurrent access of the same module, although if all modules would be using flock correctly we could also prevent concurrent access of multiple playbook-runs, our first focus would be on concurrently running the same module.

Testing (optional)

Testing is required and may be a bit more tricky, but not impossible.

Anything else?

We may want to add this using an option that at first is disabled, and when we are confident enable it in a subsequent release. Also any effect on normal performance needs to be looked at.

sivel commented 6 years ago

iirc, due to our use of atomic moves, and never editing the dest file directly (not using open) meant that a solution like this was not straight forward. I had briefly experimented with a lock file instead, but significant changes are needed.

In most cases, we read the file, then do processing and finally move it into place. This solution would also have to prevent that initial read while another task is in progress.

dagwieers commented 6 years ago

@sivel The implementation would need to use flock on the original file before we edit the copy and do the atomic move. A concurrent module would have to wait for the lock to be removed. So yes, using our own open() would not be helpful, I'll update the proposal. Thanks !

sivel commented 6 years ago

As mentioned, we also need to lock in some manner before read too. Otherwise if multiple instances read close together, they would ignore recent changes just made by the process with the lock.

I'm not looking at the docs (fcntl) but I remember some caveat about exclusive locks only applying to opening files for write.

dagwieers commented 6 years ago

@sivel Correct, see rephrased proposal. We would be using flock(), not fcntl() for locks.

bcoca commented 6 years ago

FYI, if modules use the atomic_move (as they should) there is no chance of corruption unless you specifically use the unsafe_writes option and then only if it is not possible to rename the file into place.

A lock would prevent the loss of data though as you could have 2+ diff processes updating the same file with different information and it is possible that the changes from the first process to write are lost by the 2nd overwriting if it had read the original file instead of the updated one.

@sivel a single exclusive lock before the read should be enough, no need to create lockfiles as they create many issues with cleanup, specially if the process doesn't exit cleanly.

Also, I would make the lock optional, we only really need it when multiple processes target the same file on the same machine, the normal case is a delegate_to: localhost task that updates the same file with info for all the hosts in the play.

dagwieers commented 5 years ago

I detailed my proposed implementation in ansible/ansible#52567 and we have our own context-managed open_locked() call which returns the file descriptor to be used inside the context (to avoid opening the same file and breaking the existing lock on close).

kvaps commented 5 years ago

This is might be useful not only for simultaneous writes, but for other tasks too. Eg I want to have no more one outgoing ssh connection to the host, I would be glad to use something like this in my task:

- raw: 'sleep 10; echo {{ item }}'
  args:
    lock_file: /tmp/asdasd
    lock_type: exclusive 
  delegate_to: 'somehost'
  loop: '{{ my_list }}'

bcoca commented 3 years ago

closing as per #52567

ansible / proposals

Use flock to avoid concurrent write-access by Ansible #113

Proposal: flock

Motivation

Problems

Solution proposal

Dependencies (optional)

Testing (optional)

Anything else?