cloudalchemy / ansible-node-exporter

Provision basic metrics exporter for prometheus monitoring tool
MIT License
501 stars 270 forks source link

Add support for MacOS #268

Closed zhan9san closed 2 years ago

zhan9san commented 2 years ago

Hi @paulfantom and @SuperQ

Could you help review this PR?

Inspired by #263.

@sdarwin I am looking forward to your review as well.

sdarwin commented 2 years ago

Hi @zhan9san, functionally this is removing Windows support from pr #263. In our actual real-world environment we have Linux, Mac, Windows. It's fine if server roles such as Prometheus and Grafana must be installed on recent Linux versions. The monitored nodes should be anything in the datacenter and installed as widely as possible.

zhan9san commented 2 years ago

@sdarwin

Thanks for your attention.

Yes. I need three platforms as well.

Windows exporter will be added in another PR.

As the windows exporter is not related to MacOS, if they're mixed into one PR, it will make review more difficult.

I tried my best to make it easier for your review.

Besides, I make a slight change in plist file.

Does it make sense?

sdarwin commented 2 years ago

I make a slight change in plist file.

For the plist, you have include all the various command-line options directly in the template. In the earlier pull request, what I had done is expand those various options inside a wrapper script instead. That simplified the appearance of the plist file. All it needed to do is call the external wrapper script. Or, you could run that script directly from the command line. This eliminated any issues regarding how the xml might deal with formatting, newlines, special characters, anything else, since now the xml won't include those things. so, it seemed to make the layout clearer. You have included "UserName" in the plist. Did you test? Does it run as the User? List processes with "ps" or similar command.

zhan9san commented 2 years ago

That simplified the appearance of the plist file. All it needed to do is call the external wrapper script.

If we have only one plist file, the only thing we need to manage the exporter is to run

  1. Start node_exporter: sudo launchctl load /Library/LaunchDaemons/homebrew.mxcl.node_exporter.plist
  2. Stop node_exporter: sudo launchctl unload /Library/LaunchDaemons/homebrew.mxcl.node_exporter.plist

But if we have another script to manage, besides running launchctl command, we have to run another command pkill to stop the process.

So I think it is more convenient.

Regarding the special characters, that's a good point, but now, it's fun to including all things in one plist file, which is referring to Nginx plist, Apache plist and Postgresql plist.

For UserName, yes, I tested it in my local Mac laptop. It works well.

I tried to add some more test but there are some limits of pytest-testinfra on MacOS.

Here is a test PR check on my own forked repo for your reference, https://github.com/zhan9san/ansible-node-exporter/pull/1/checks

You can verify it locally by running the following commands.

git clone https](https://github.com/zhan9san/ansible-node-exporter.git
cd ansible-node-exporter
molecule test -s default-macos
zhan9san commented 2 years ago

You have included "UserName" in the plist. Did you test? Does it run as the User? List processes with "ps" or similar command.

I added one test to process in new commit.

https://github.com/cloudalchemy/ansible-node-exporter/pull/268/commits/e7cbc198bc9cc7b8763362c45b0bcc1485c4b8b2

sdarwin commented 2 years ago

Hi @zhan9san here is a partial review. I launched an Mac mini M1, Apple silicon, at scaleway.com. Only €0.10/Hour. The problem is that node-exporter seems to be broken on the latest mac mini using arm64 architecture.

  1. Could you added ansible-lint to CI? For example, in ci.yml:
      - name: Set up Python 3.
        uses: actions/setup-python@v2
        with:
          python-version: '3.x'

      - name: Install test dependencies.
        run: pip3 install yamllint ansible-lint ansible

      - name: Lint code.
        run: |
          set -e
          yamllint .
          ansible-lint .

Because there are linting errors. "yaml: duplication of key "group" in mapping (yaml[key-duplicates])"

  1. in main/defaults.yml update node_exporter_version

    node_exporter_version: 1.3.1

    A newer version (1.3.1) is necessary to attempt an installation on mac arm64.

  2. You named the launch file homebrew.mxcl.node_exporter.plist.j2 . I had named it cloudalchemy.mxcl.node_exporter.plist.j2. Why choose the name "homebrew"? It's not being installed by homebrew, or provided by homebrew.

  3. During template installation, I got an error "FAILED! => {"changed": false, "msg": "AnsibleError: template error while templating string: tag name expected. String: <?xml version=\"1.0\ ........" That is unexpected, and I have not debugged it, yet. May be a quirk in the installation environment, and will disappear later.

  4. On Apple, nologin seems to be /sbin/nologin instead of /usr/sbin/nologin ? This could be dynamically set, similar to the _node_exporter_root_group variable.

I will test on an older mac such as Catalina 10.15, hopefully next month.

zhan9san commented 2 years ago

@sdarwin

Thanks for your attention.

  1. Could you added ansible-lint to CI? For example, in ci.yml:

Yes, I'd like to add lint as well. yamllint . is added in Lint job in ci.yml.

But for ansible-lint, can we add it in another PR? It is because there are lots of lint issue currently which is not related to this PR.

  1. in main/defaults.yml update node_exporter_version

Could we add it in another PR? The node_exporter_version is a global configuration. And it works well in MacOS intel. Besides, I'll investigate whether can we add both intel-based and arm-based runner in Github Action.

  1. You named the launch file homebrew.mxcl.node_exporter.plist.j2

I am new to launch file and it is referred to nginx, httpd and postgres, in which homebrew is in use, it is changed back to homebrew.mxcl.node_exporter.plist.j2 in new commit.

  1. May be a quirk in the installation environment, and will disappear later.

If it can be reproduced, I'd like to debug it as well

  1. On Apple, nologin seems to be /sbin/nologin instead of /usr/sbin/nologin ? This could be dynamically set, similar to the _node_exporter_root_group variable.

It is fixed in new commit.

sdarwin commented 2 years ago

Yes, I'd like to add lint as well. yamllint . is added in Lint job in ci.yml. But for ansible-lint, can we add it in another PR? It is because there are lots of lint issue currently which is not related to this PR.

You added yamllint to ci.yml, and now it shows an error. :-)

For ansible-lint, yes a different pull request that fixes linting issues would make sense. Debatably, fqcn-builtins is not really a problem and is only needed to match their newest strict linting rules.

zhan9san commented 2 years ago

You added yamllint to ci.yml, and now it shows an error. :-)

Thanks for pointing it out. It is fixed in new commit.

zhan9san commented 2 years ago

Besides, I'll investigate whether can we add both intel-based and arm-based runner in Github Action.

There is no ETA for enabling M1 support in the GitHub Actions hosted runners. https://github.com/actions/runner-images/issues/2187

zhan9san commented 2 years ago

@sdarwin

Thanks for your review.

As you know, Still maintained? #266 issue, even we have this PR ready, it may not be merged.

Do you have concern if we move to another repo, https://github.com/devops37/ansible-node-exporter?

sdarwin commented 2 years ago

if we move to another repo,

Not really sure. Having official roles is preferable. In the case of ansible-node-exporter specifically (not the others), windows and macos are required, and I'm using a fork to handle that...