EnterpriseDB / edb-ansible

Ansible code for deploying EDB Postgres database clusters and related products.
Other
76 stars 49 forks source link

all_nodes.py is killing my ansible worker #635

Open jbe33 opened 4 months ago

jbe33 commented 4 months ago

Hello, We are encountering an issue with the Ansible tasks in the linux_update_etc_hosts.yml file. We use inventories that include several hundred machines, and when calling these plays, we spend a lot of time parsing the entire inventory to construct the /etc/hosts list (which doesn't add much value). Our Ansible machines have limited RAM, and we frequently encounter the error "A worker was found in a dead state."

We've noticed that we achieve the same result much faster and more efficiently by replacing the call to the all_nodes.py collection with the call to the pg_sr_cluster_nodes collection.

Old:

- name: Build hosts_lines, based on the inventory
  ansible.builtin.set_fact:
    hosts_lines: >
      {{ hosts_lines | default([]) + [
        {
          'line': item.private_ip + ' ' + item.inventory_hostname,
          'regexp': '.*\s' + item.inventory_hostname | regex_escape() + '$'
        }
      ] }}
  loop: "{{ lookup('edb_devops.edb_postgres.all_nodes', wantlist=True) }}"

New:

- name: Build hosts_lines, based on the inventory
  ansible.builtin.set_fact:
    hosts_lines: >
      {{ hosts_lines | default([]) + [
        {
          'line': item.private_ip + ' ' + item.inventory_hostname,
          'regexp': '.*\s' + item.inventory_hostname | regex_escape() + '$'
        }
      ] }}
  loop: "{{ lookup('edb_devops.edb_postgres.pg_sr_cluster_nodes', wantlist=True) }}"

We would like to know your thoughts on this modification. Another approach could be to bypass this build_host_lines step using a when condition.

I can submit a pull request (PR) if the solution works for you.

Thank you.

vibhorkumar123 commented 4 months ago

We used all_nodes because it allows communication between multiple nodes by adding node information in /etc/hosts. If you go with pg_sr_cluster_nodes, then if you want to deploy backup nodes and monitoring nodes as part of the deployment, the primary/standbys won't be able to connect.