Closed dagwieers closed 3 years ago
iirc, due to our use of atomic moves, and never editing the dest file directly (not using open) meant that a solution like this was not straight forward. I had briefly experimented with a lock file instead, but significant changes are needed.
In most cases, we read the file, then do processing and finally move it into place. This solution would also have to prevent that initial read while another task is in progress.
@sivel The implementation would need to use flock on the original file before we edit the copy and do the atomic move. A concurrent module would have to wait for the lock to be removed. So yes, using our own open() would not be helpful, I'll update the proposal. Thanks !
As mentioned, we also need to lock in some manner before read too. Otherwise if multiple instances read close together, they would ignore recent changes just made by the process with the lock.
I'm not looking at the docs (fcntl) but I remember some caveat about exclusive locks only applying to opening files for write.
@sivel Correct, see rephrased proposal. We would be using flock(), not fcntl() for locks.
FYI, if modules use the atomic_move
(as they should) there is no chance of corruption unless you specifically use the unsafe_writes
option and then only if it is not possible to rename the file into place.
A lock would prevent the loss of data though as you could have 2+ diff processes updating the same file with different information and it is possible that the changes from the first process to write are lost by the 2nd overwriting if it had read the original file instead of the updated one.
@sivel a single exclusive lock before the read should be enough, no need to create lockfiles as they create many issues with cleanup, specially if the process doesn't exit cleanly.
Also, I would make the lock optional, we only really need it when multiple processes target the same file on the same machine, the normal case is a delegate_to: localhost
task that updates the same file with info for all the hosts in the play.
I detailed my proposed implementation in ansible/ansible#52567 and we have our own context-managed open_locked() call which returns the file descriptor to be used inside the context (to avoid opening the same file and breaking the existing lock on close).
This is might be useful not only for simultaneous writes, but for other tasks too. Eg I want to have no more one outgoing ssh connection to the host, I would be glad to use something like this in my task:
- raw: 'sleep 10; echo {{ item }}'
args:
lock_file: /tmp/asdasd
lock_type: exclusive
delegate_to: 'somehost'
loop: '{{ my_list }}'
closing as per #52567
Proposal: flock
Author: Dag Wieers <@dagwieers> IRC: dag
Date: 2018-05-09
Motivation
Currently when modules are concurrently running on the same system (eg. delegated to localhost) modifying the same file can lead to corruption or data loss. Adding flock() support to file-access would avoid such problems (or report issues back to the user).
Problems
Simple example, when reprovisioning systems concurrently, you may want to remove existing keys from known_hosts on the Ansible host. Doing that currently with the known_hosts module will fail and there's no easy way to do this from a role as you can't use serial/forks on a task.
Second example, concurrently ensuring a specific file (eg. known_hosts) was removed would fail, because the file would be removed by one of the processes (in between testing existence and the actual removal) and the removal would fail. (This was fixed in the file module in PR ansible/ansible#39466, but would not have happened if flock was implemented).
Third example, in some cases functionality is not allowed to run concurrently, but is not prevented by the API (eg. Cobbler syncs). Currently there is no means to avoid concurrent calls, but a flock implementation could prevent a module from being run concurrently.
Solution proposal
Dependencies (optional)
Testing (optional)
Testing is required and may be a bit more tricky, but not impossible.
Anything else?
We may want to add this using an option that at first is disabled, and when we are confident enable it in a subsequent release. Also any effect on normal performance needs to be looked at.