ansible-collections / community.general

Ansible Community General Collection
https://galaxy.ansible.com/ui/repo/published/community/general/
GNU General Public License v3.0
827 stars 1.53k forks source link

community.general.ufw rule deletion fails silently when used in a concurrent/parallel way #4769

Open delahondes opened 2 years ago

delahondes commented 2 years ago

Summary

When I try to use community.general.ufw to delete rules, it fails sometimes silently. This happens only when playbooks are launched in parallel typically doing something like:

for i in $(seq 1 20); do ansible-playbook destroy_vm.yaml --extra-vars "nodename=node$i" & done

(in which destroy_vm.yaml will call community.general.ufw at one step)

Issue Type

Bug Report

Component Name

ufw

Ansible Version

$ ansible --version
ansible [core 2.12.2]
  config file = /etc/ansible/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3/dist-packages/ansible
  ansible collection location = /root/.ansible/collections:/usr/share/ansible/collections
  executable location = /usr/bin/ansible
  python version = 3.8.10 (default, Mar 15 2022, 12:22:08) [GCC 9.4.0]
  jinja version = 3.1.1
  libyaml = True

Community.general Version

$ ansible-galaxy collection list community.general

# /usr/lib/python3/dist-packages/ansible_collections
Collection        Version
----------------- -------
community.general 4.5.0  

Configuration

$ ansible-config dump --only-changed
DEFAULT_HOST_LIST(/etc/ansible/ansible.cfg) = ['/etc/ansible/inventory']
INVENTORY_CACHE_ENABLED(/etc/ansible/ansible.cfg) = False
INVENTORY_CACHE_TIMEOUT(/etc/ansible/ansible.cfg) = 3600000

OS / Environment

Ubuntu 20.04

Steps to Reproduce

name: Update the firewall with node IPv4
  community.general.ufw:
    from_ip: "{{ hostvars[nodename].ipv4 }}"
    rule: allow
    delete: yes

Launch in parallel at least 3 times such a task in a playbook and you should see that ufw fails silently to remove the rules

Expected Results

The firewall rule should be deleted.

Actual Results

The firewall rule is not deleted.

I am aware that this is a UFW bug, not an ansible bug or this collection fault. However, it affects the behaviour of Ansible and or the collection. I have commented UFW already opened (and closed) bug on that topic: https://bugs.launchpad.net/ufw/+bug/1204579

However, this issue can be bypassed using task-spooler, a very basic Debian/Ubuntu task spooler which can be installed this way

apt install task-spooler

And the task has to be changed that way:

ansible.builtin.script:
   cmd: /usr/bin/tsp /usr/sbin/ufw delete allow from "{{ hostvars[nodename].ipv4 }}"
   become: true

This is very efficient with one defect: the rule is not instantly deleted, there can be a little delay. As it is intended in a house cleaning spirit, this did not bother me too much.

Code of Conduct

ansibullbot commented 2 years ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

ansibullbot commented 2 years ago

cc @ahtik @ovcharenko @pyykkis click here for bot help

delahondes commented 2 years ago

BTW this does not affect adding rules, in my experience this is always working. Only deleting rule is affected

russoz commented 2 years ago

hi @delahondes, thanks for raising the issue.

Ansible does have a mechanism to control concurrency, you can find out more about it in: https://docs.ansible.com/ansible/latest/user_guide/playbooks_strategies.html. If you loop through the rules being deleted with Ansible itself, as opposed to resorting to bash, you should be able to use it.

I am not sure what you would be expecting to be done - given that you already stated this is not a bug in Ansible or in this collection. I don't think it makes much sense to implement concurrency controls in individual modules when you have ways to tweak that for all of them.

delahondes commented 2 years ago

Thanks Alexei @russoz. I have looked into that and I am unsure it fits my use case. I use Ansible to deploy a small cluster (between 5 to 50) of virtual machine for HPC purpose. These virtual machines are then destroyed when they return to an idle state (the system is designed to operate on a batch). I could have a queue and delete them in a linear way but it is a lot easier for me to destroy them as soon as the triggering signal is received (async mechanism, no queue, very simple).

delahondes commented 2 years ago

And I fully agree with you, it would make little sense to create an adhoc concurrency control here. I just wanted to explain the nature of the defect, and give a little trick to conveniently bypass it. I was also hoping you would have some contacts with UFW team and maybe come back to them - as I find this is a rather serious defect in UFW software. I happen to have an ideal use case to study this issue (the issue happens in one third of the executions even at high concurrency rate, so it is not easy to reproduce), so if needed I will be happy to collaborate. Sorry if a bug declaration was not the ideal place for that.

ansibullbot commented 2 years ago

Files identified in the description:

If these files are incorrect, please update the component name section of the description or use the !component bot command.

click here for bot help

russoz commented 2 years ago

@delahondes as you wrote, and I agree, this is not an issue with this module. This collection interacts with a broad range of tools, libraries and knowledge domains, so keep close ties to all these underlying projects is not practical.

One other way to deal with it is to let Ansible itself loop over the nodes. But the tasks will be performed sequentially.

That being said, I cannot see any path forward for this ticket. What would you like to do with it?

delahondes commented 2 years ago

I understand. I just think that a warning should be made:

That should be fine. In more details using a rule above iptable itself and not ufw was the solution we finally adopted.