GoogleCloudPlatform / google-cloud-ops-agents-ansible

Ansible Role for Google Cloud Ops
https://cloud.google.com/products/operations
Apache License 2.0
100 stars 55 forks source link

Task Add repo and install agent or remove repo and uninstall agent failed #104

Closed jeewan-gautam closed 1 year ago

jeewan-gautam commented 1 year ago

Sample play book running on RHEL 8.5 roles:

Error: FAILED - RETRYING: [ansible-test-vm1]: Add repo and install agent or remove repo and uninstall agent (5 retries left). FAILED - RETRYING: [ansible-test-vm1]: Add repo and install agent or remove repo and uninstall agent (4 retries left). FAILED - RETRYING: [ansible-test-vm1]: Add repo and install agent or remove repo and uninstall agent (3 retries left). FAILED - RETRYING: [ansible-test-vm1]: Add repo and install agent or remove repo and uninstall agent (2 retries left). FAILED - RETRYING: [ansible-test-vm1]: Add repo and install agent or remove repo and uninstall agent (1 retries left). fatal: [ansible-test-vm1]: FAILED! => {"attempts": 5, "changed": true, "cmd": ["bash", "add-google-cloud-ops-agent-repo.sh", "--also-install", "--version=latest"], "delta": "0:00:03.308186", "end": "2023-08-15 01:19:02.902590", "msg": "non-zero return code", "rc": 1, "start": "2023-08-15 01:18:59.594404", "stderr": "Repository google-cloud-ops-agent is listed more than once in the configuration\nErrors during downloading metadata for repository 'google-cloud-ops-agent':\n - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)\nError: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried\nAttempt 1 of 3 failed: yum -y list updates\nRepository google-cloud-ops-agent is listed more than once in the configuration\nErrors during downloading metadata for repository 'google-cloud-ops-agent':\n - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)\nError: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried\nAttempt 2 of 3 failed: yum -y list updates\nRepository google-cloud-ops-agent is listed more than once in the configuration\nErrors during downloading metadata for repository 'google-cloud-ops-agent':\n - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)\nError: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried\nAttempt 3 of 3 failed: yum -y list updates\nCommand: yum -y list updates failed\n[2023-08-15T01:19:02+0000] Could not refresh the google-cloud-ops-agent yum repositories.\nPlease check your network connectivity and make sure you are running a supported\nrhel distribution. See https://cloud.google.com/stackdriver/docs/solutions/ops-agent/#supported_operating_systems\nfor a list of supported platforms.", "stderr_lines": ["Repository google-cloud-ops-agent is listed more than once in the configuration", "Errors during downloading metadata for repository 'google-cloud-ops-agent':", " - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)", "Error: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "Attempt 1 of 3 failed: yum -y list updates", "Repository google-cloud-ops-agent is listed more than once in the configuration", "Errors during downloading metadata for repository 'google-cloud-ops-agent':", " - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)", "Error: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "Attempt 2 of 3 failed: yum -y list updates", "Repository google-cloud-ops-agent is listed more than once in the configuration", "Errors during downloading metadata for repository 'google-cloud-ops-agent':", " - Status code: 404 for https://packages.cloud.google.com/yum/repos/google-cloud-ops-agent-Ootpa-x86_64-all/repodata/repomd.xml (IP: 172.217.13.110)", "Error: Failed to download metadata for repo 'google-cloud-ops-agent': Cannot download repomd.xml: Cannot download repodata/repomd.xml: All mirrors were tried", "Attempt 3 of 3 failed: yum -y list updates", "Command: yum -y list updates failed", "[2023-08-15T01:19:02+0000] Could not refresh the google-cloud-ops-agent yum repositories.", "Please check your network connectivity and make sure you are running a supported", "rhel distribution. See https://cloud.google.com/stackdriver/docs/solutions/ops-agent/#supported_operating_systems", "for a list of supported platforms."], "stdout": "Google Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 \nGoogle Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 \nGoogle Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 ", "stdout_lines": ["Google Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 ", "Google Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 ", "Google Cloud Ops Agent Repository 6.3 kB/s | 1.4 kB 00:00 "]}

tdgeery commented 1 year ago

I am having this problem as well. This is very strange though.. Ansible just runs the same command as the doc page..

curl -sSO https://dl.google.com/cloudagents/add-google-cloud-ops-agent-repo.sh
sudo bash add-google-cloud-ops-agent-repo.sh --also-install

and that installs successfully 🤷

tdgeery commented 1 year ago

Looks like it works if you add sudo to linux.yml#L23

- name: Add repo and install agent or remove repo and uninstall agent
  command:
    chdir: "{{ tempfolder.path }}"
    cmd: >
      sudo bash add-{{ 'google-cloud-ops' if agent_type == 'ops-agent' else agent_type }}-agent-repo.sh {{ '--also-install' if package_state == 'present' else
      '--uninstall --remove-repo' }} --version={{ version }} {{ '--dry-run' if ansible_check_mode else '' }}
...
manjumj1553 commented 1 year ago

Looks like it works if you add sudo to linux.yml#L23

- name: Add repo and install agent or remove repo and uninstall agent
  command:
    chdir: "{{ tempfolder.path }}"
    cmd: >
      sudo bash add-{{ 'google-cloud-ops' if agent_type == 'ops-agent' else agent_type }}-agent-repo.sh {{ '--also-install' if package_state == 'present' else
      '--uninstall --remove-repo' }} --version={{ version }} {{ '--dry-run' if ansible_check_mode else '' }}
...

We were trying this on sles, similar issue i faced . I have opened bug #103 . Apparently if you comment below lines, it is working properly. Control flow is not breaking and installation is going through fine.

Filename: linux.yml Line no: 28 to 31 Code changes performed

retries: 5

delay: 10

until: result.rc == 0

check_mode: false

tdgeery commented 1 year ago

Looks like it works if you add sudo to linux.yml#L23

- name: Add repo and install agent or remove repo and uninstall agent
  command:
    chdir: "{{ tempfolder.path }}"
    cmd: >
      sudo bash add-{{ 'google-cloud-ops' if agent_type == 'ops-agent' else agent_type }}-agent-repo.sh {{ '--also-install' if package_state == 'present' else
      '--uninstall --remove-repo' }} --version={{ version }} {{ '--dry-run' if ansible_check_mode else '' }}
...

We were trying this on sles, similar issue i faced . I have opened bug #103 . Apparently if you comment below lines, it is working properly. Control flow is not breaking and installation is going through fine.

Filename: linux.yml Line no: 28 to 31 Code changes performed #retries: 5 #delay: 10 #until: result.rc == 0 #check_mode: false

Does the package actually install though?

igorpeshansky commented 1 year ago

This was caused by a minor change to the installation scripts that broke an assumption in our ansible playbooks. As such, it has the same cause as #103.

Side note: Ootpa seems to be a Core OS codename — none of our agents are supported on Core OS.

Duplicate of #103.