cloudalchemy / ansible-node-exporter

Provision basic metrics exporter for prometheus monitoring tool
MIT License
501 stars 270 forks source link

'latest' version doesn't work #240

Closed sfuerte closed 1 year ago

sfuerte commented 3 years ago

What happened? As per description:

node_exporter_version : Node exporter package version. Also accepts latest as parameter.

It fails upon trying to get a checksum

Did you expect to see some different?

Role run successfully

How to reproduce it (as minimally and precisely as possible):

    - name: run Prometheus 'node_exporter' role
      include_role:
        name: cloudalchemy.node_exporter
      vars:
        node_exporter_version: "latest"

Environment

$ lsb_release -d
Description:    Debian GNU/Linux 10 (buster)

/etc/ansible/roles/cloudalchemy.node_exporter/meta/.galaxy_install_info:

{install_date: 'Thu Aug 26 19:30:37 2021', version: 2.0.0}

see above

TASK [cloudalchemy.node_exporter : Get latest release] *************************
task path: /etc/ansible/roles/cloudalchemy.node_exporter/tasks/preflight.yml:75
Thursday 26 August 2021  20:38:47 +0000 (0:00:00.034)       0:05:32.931 *******
ok: [/tmp/build/project/mnt -> localhost] => {
    "attempts": 1,
    "censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result",
    "changed": false
}

TASK [cloudalchemy.node_exporter : Set node_exporter version to 1.2.2] *********
task path: /etc/ansible/roles/cloudalchemy.node_exporter/tasks/preflight.yml:89
Thursday 26 August 2021  20:38:48 +0000 (0:00:00.771)       0:05:33.702 *******
ok: [/tmp/build/project/mnt -> localhost] => {
    "ansible_facts": {
        "node_exporter_version": "1.2.2"
    },
    "changed": false
}

TASK [cloudalchemy.node_exporter : Get checksum list from github] **************
task path: /etc/ansible/roles/cloudalchemy.node_exporter/tasks/preflight.yml:99
Thursday 26 August 2021  20:38:48 +0000 (0:00:00.057)       0:05:33.759 *******
fatal: [/tmp/build/project/mnt -> localhost]: FAILED! => {}

MSG:

An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found

NO MORE HOSTS LEFT *************************************************************

Anything else we need to know?:

overriding a variable via set_fact won't work the way it's done in https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L90 See more in this discussion - https://stackoverflow.com/questions/27038553/ansible-set-fact-doesnt-change-the-variable-value

nemcikjan commented 2 years ago

any updates on this?

wowpetr commented 2 years ago

Did some debugging of ansible.builtin.uri module and found how to solve this. You need to run your playbook with -K or --ask-become-pass (ask for privilege escalation password) and you will be asked for sudo password of your local machine (not target). Seems that ansible.builtin.uri module requires sudo of your local machine somehow.

sixtus commented 2 years ago

it crashes the whole python process for me on my mac. Passing -K does not fix it.

wowpetr commented 2 years ago

To @sixtus: If it crashes during multithreading tasks on your mac then it isn't related to this problem and you have to add this line "export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES" to your .zshrc/.bashrc and restart your terminal session

sixtus commented 2 years ago

@wowpetr thanks, that fixed the crash, however I need to pass a version, "Get latest release" fails with FAILED! => {"censored": "the output has been hidden due to the fact that 'no_log: true' was specified for this result"}

wowpetr commented 2 years ago

@sixtus with -K argument and local sudo password provided?

sixtus commented 2 years ago

@wowpetr I run into a different problem now, so -K seems to help. Why I need local sudo rights to run ansible beats me though (and doesn't build confidence)

wowpetr commented 2 years ago

@sixtus when I was debugging it I could see that get-uri module downloads the github version page locally to parse the latest version and it needs access to stderr which requires sudo (at least it says so).

humberto-garza commented 2 years ago

Running into this issue as well:

node_exporter_version = latest

TASK [cloudalchemy.node_exporter : Get checksum list from github] **********************************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/cloudalchemy.node_exporter/tasks/preflight.yml:99
fatal: [<REDACTED> -> localhost]: FAILED! =>
  msg: 'An unhandled exception occurred while running the lookup plugin ''url''. Error was a <class ''ansible.errors.AnsibleError''>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt : HTTP Error 404: Not Found'

Does not exist: https://github.com/prometheus/node_exporter/releases/download/vlatest/sha256sums.txt

Dmitry099 commented 1 year ago

I've faced with same issue.

As I understand, in string https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L91), we're trying to override node_exporter_version variable by using set_fact.

- name: "Set node_exporter version to {{ _latest_release.json.tag_name[1:] }}"
      set_fact:
        node_exporter_version: "{{ _latest_release.json.tag_name[1:] }}"

However, set_fact have low precedence and precedence is more important than the order in which the value is assigned. For more info please see -> https://stackoverflow.com/questions/27038553/ansible-set-fact-doesnt-change-the-variable-value

Workaround is to create additional variable (i.e. internal variable _node_exporter_version) which will be used instead of variable node_exporter_version in steps after "Set node_exporter version to" and add additional step to define this variable for node exporter with predefined version. Example:

- block:
    - name: Get latest release
      uri:
        url: "https://api.github.com/repos/prometheus/node_exporter/releases/latest"
        method: GET
        return_content: true
        status_code: 200
        body_format: json
        user: "{{ lookup('env', 'GH_USER') | default(omit) }}"
        password: "{{ lookup('env', 'GH_TOKEN') | default(omit) }}"
      no_log: "{{ not lookup('env', 'MOLECULE_DEBUG') | bool }}"
      register: _latest_release
      until: _latest_release.status == 200
      retries: 5

    - name: "Set internal variable for node_exporter version to {{ _latest_release.json.tag_name[1:] }}"
      set_fact:
        _node_exporter_version: "{{ _latest_release.json.tag_name[1:] }}"
  when:
    - node_exporter_version == "latest"
    - node_exporter_binary_local_dir | length == 0
  delegate_to: localhost
  run_once: true

- name: "Set internal variable for node_exporter version to {{ node_exporter_version }}"
    set_fact:
      _node_exporter_version: "{{ node_exporter_version }}"
  when:
    - node_exporter_version != "latest"
SuperQ commented 1 year ago

This role has been deprecated in favor of a the prometheus-community/ansible collection.