geerlingguy / airgradient-prometheus

AirGradient Prometheus exporter.
MIT License
174 stars 59 forks source link

Implement Prometheus Scrape Server #4

Closed kashalls closed 2 years ago

kashalls commented 3 years ago

Hello there 👋 I don't usually work with Arduino, but something about your project has encouraged me to contribute.

Here's what is possible now:

Please test this out and let me know what you think.

geerlingguy commented 3 years ago

Well then! It's quite a change, so I'll definitely need to do a bit more testing before merging, but I like this. A lot... no need for the intermediary container at all (containers, in fact, with the silly way I had it set up).

If I were trying to emulate AirGradient's cloud system, the method I was using was fine, but since I'm using Prometheus, this makes everything simpler, and by extension more robust. So thanks.

After I merge, I'll also have to make sure to update my blog post a bit, and also my internet-pi project to follow suit.

kashalls commented 3 years ago

@geerlingguy The guys and I think that a identifier string is cool and all, but we were thinking that the mac address or the ip would a much better identifier. Can I get your thoughts on this?

jcollie commented 3 years ago

We could do that, I was thinking I eventually would want to eventually move sensors and changing the name would work better in Prometheus. We can definitely go this route if we want to not rely on custom names to identify nodes

I wouldn't get rid of a custom label/name for those that want it.

onedr0p commented 3 years ago

I would go with the instance=ip:port as the unique identifier.

@jcollie you can add custom labels with prometheus using relabel_configs e.g.

    relabel_configs:
      # source label must be one that exists, so use instance
      - source_labels: [instance]
        # target label is the one you want to create
        target_label: location
        replacement: "living-room"
geerlingguy commented 3 years ago

I'd default it to maybe IP+port, but make it so people can set it manually if they want (I like keeping it simple, and either tying it to hardware (so MAC or the device ID), or my own custom string ('livingroom', etc.).

I don't like having to relabel stuff in Prometheus, because that makes automated Prometheus config more annoying.

onedr0p commented 3 years ago

Something like this @geerlingguy ?

# pseudo code
custom_label=""
instance="ip:port"

if (custom_label != "") {
  instance=custom_label
}

I don't see mac or deviceid being very useful when you have already have IP:PORT, it would bloat the logic.

geerlingguy commented 3 years ago

@onedr0p - Yeah, basically.

I'm fine with IP, MAC, or whatever, I just think of sensors less in terms of network address and more in terms of a UUID for that sensor. (I think of them less like computers/servers that are on the network.)

jcollie commented 3 years ago

Going with WiFi mac address is going to be stable long term, it won't change if you decide to redo your network config. I also would not make "instance" either a custom label or a hardware identifier. I'd always include a mac_address="xx:xx:xx:xx:xx:xx" label. Optionionaly add (not replace) a custom label if you choose to define one.

jcollie commented 3 years ago

you can add custom labels with prometheus using relabel_configs e.g.

Why bother with that when you control the source of the sensor?

onedr0p commented 3 years ago

Coming from a lot of the exporters I've seen (node_exporter, cadvisor, etc..) instance is always set to the IP:PORT. We can also add a mac_address="xx:xx:xx:xx:xx:xx" label too. They can both be present.

I would suggest the logic add custom_label to the labels above if present.

kashalls commented 3 years ago

Let's just add all three. User can choose which one they want.

onedr0p commented 3 years ago

Let's just add all three. User can choose which one they want.

Yup but I mentioned above, always have instance and mac_address labels and add custom_label if it's not empty. This is the most flexible for all our usecases.

jcollie commented 3 years ago

I believe that instance is automatically added by Prometheus: https://prometheus.io/docs/concepts/jobs_instances/#automatically-generated-labels-and-time-series

kashalls commented 3 years ago

@geerlingguy Please review and advise.

geerlingguy commented 3 years ago

I like the code, but one question: to try to make this maintainable and awesome, could you restore the CI workflow, and either use something like Adafruit's CI workflow with arduino --verify, or an action dedicated to the task like https://github.com/marketplace/actions/test-compile-for-arduino ?

I hate to remove CI when I had some before, because for long-term maintenance things can get messy without thinking about it (or I'll merge a typofix that actually breaks the build...).

kashalls commented 3 years ago

I like the code, but one question: to try to make this maintainable and awesome, could you restore the CI workflow, and either use something like Adafruit's CI workflow with arduino --verify, or an action dedicated to the task like https://github.com/marketplace/actions/test-compile-for-arduino ?

I hate to remove CI when I had some before, because for long-term maintenance things can get messy without thinking about it (or I'll merge a typofix that actually breaks the build...).

I'll look into it tonight.

valumar commented 3 years ago

IMHO it's very bad idea for IoT device to expose http server. IoT should be able to push very fast its metrics to remote server and go to sleep mode for energy saving.

onedr0p commented 3 years ago

IMHO it's very bad idea for IoT device to expose http server. IoT should be able to push very fast its metrics to remote server and go to sleep mode for energy saving.

Do you have any insights to what you mean? I would be very curious to know what power differences this would have. This is already polling the sensors every so often and I don't know sleeping would even happen.

valumar commented 3 years ago

IMHO it's very bad idea for IoT device to expose http server. IoT should be able to push very fast its metrics to remote server and go to sleep mode for energy saving.

Do you have any insights to what you mean? I would be very curious to know what power differences this would have. This is already polling the sensors every so often and I don't know sleeping would even happen.

  1. The most of IoT devices is powered by battery. The consumption of esp in active state is 2 orders of magnitude more than in powersave mode excluding power consumption of sensors (https://www.esp8266.com/wiki/doku.php?id=esp8266_power_usage). It's also possible to manipulate the power consuption of polling sensors.
  2. Exposed http server is an additional attack vector.
onedr0p commented 3 years ago
  1. AFAIK this device with the sensors is not meant to be run on bats.
  2. Doing anything is an additional attack vector, just know what you are doing and how to prevent attacks. We cannot stop people from exposing this over the internet, and we are not responsible for what they do.
kashalls commented 3 years ago

IMHO it's very bad idea for IoT device to expose http server. IoT should be able to push very fast its metrics to remote server and go to sleep mode for energy saving.

Do you have any insights to what you mean? I would be very curious to know what power differences this would have. This is already polling the sensors every so often and I don't know sleeping would even happen.

  1. The most of IoT devices is powered by battery. The consumption of esp in active state is 2 orders of magnitude more than in powersave mode excluding power consumption of sensors (https://www.esp8266.com/wiki/doku.php?id=esp8266_power_usage). It's also possible to manipulate the power consuption of polling sensors.
  2. Exposed http server is an additional attack vector.

Based on the original creators intentions, this device is not to be battery powered afaik. Each sensor was designed to poll every 3 seconds and block IO during this time. By this way, you can have Prometheus grab stats when it wants and using the configured wait time for the screen, you can pull the sensor when it wants + have low over head instead of posting it literally every 9 seconds to a server that doesn't know what it's doing with the data except making it available to Prometheus which would just make more power consumption.

kashalls commented 3 years ago

I like the code, but one question: to try to make this maintainable and awesome, could you restore the CI workflow, and either use something like Adafruit's CI workflow with arduino --verify, or an action dedicated to the task like https://github.com/marketplace/actions/test-compile-for-arduino ?

I hate to remove CI when I had some before, because for long-term maintenance things can get messy without thinking about it (or I'll merge a typofix that actually breaks the build...).

I'm working on this still. Any reason we would have to support anything OTHER than the Wemos D1 Mini?

geerlingguy commented 2 years ago

@Kashalls - Nope, no reason, D1 Mini is the only target I care about in this case (since this is narrowly focused on the AirGradient). Any other device would be 'use at your own risk'.

nar1117 commented 2 years ago

Thanks for coming up with this alternative method! I'm running all of Jeff's containers for the internet-pi project on an Unraid server, and I could not for the life of me get the AirGradient http server to work properly. Something to do with the way PHP was being initiated. Stumbled upon your fork, uploaded the sketch, and boom. Great job, thanks to you and mr. geerling for the work!

kashalls commented 2 years ago

If anyone wants to help with the workflow for validating the ino, that would be great. I'm not good with the workflows or the Arduino verifier stuff.

kashalls commented 2 years ago

@geerlingguy From my standpoint, this is the most basic compile check I can do. Let me know if we want to something else.

kashalls commented 2 years ago

I don't think you should be setting a static IP on the device itself. You can do that through your router. We can implement it if it's necessary but I don't think it is.

Ndgc commented 2 years ago

That's fair. It's only a define switch, so you can leave it disabled if you even decide to include it. I was more worried by the comments I saw on some of the examples about it potentially becoming an AP if you don't explicitly set station mode.

ppeter10 commented 2 years ago

I really appreciate the work you have all done. I just got my sensors and use your script and found some tweaks that you may want to use.

This will prevent the display from spending more than the update frequency if you miss a sensor.

void updateScreen(long now) {
...
    bool hasUpdated = false;
...
      case 0:
        if (hasPM) {
          int stat = ag.getPM1_Raw();
          showTextRectangle("PM1",String(stat),false);
          hasUpdated = true; 
        }
        break;
...
    if(hasUpdated)  lastUpdate = millis();

This will add RSSI value. (not really needed)

String GenerateMetrics() {
...
  if (reportRSSI) {
    long rssi = WiFi.RSSI();

    message += "# HELP rssi Show WiFi quality\n";
    message += "# TYPE rssi gauge\n";
    message += "rssi";
    message += idString;
    message += String(rssi);
    message += "\n";

  }

The last modification that I did was to report PM1 and PM10. This requires modification of the Air Gradient library. The sensor doesn't claim it but it reports these values. There are other sensors that do claim it and are probably validated to read these values. I get readings that seem in line with expected values. not saying anything about their accuracy unless you change the dust sensor for one rated for it. I won't go into details on how to do this it just copies, paste, and change name. But I can provide files if requested.

NeilMartin commented 2 years ago

I'm new to raspberry pi and prometheus, but I've just got this pull request working. I can now see the sensor data collected by prometheus by browsing my raspberry pi. Thank you @Kashalls for making this pull request. I do have one teensy nitpick, it should be "rco2" not "rc02". Far from critical, but I had to point it out. Thanks to all on this pull request!

kashalls commented 2 years ago

I'm new to raspberry pi and prometheus, but I've just got this pull request working. I can now see the sensor data collected by prometheus by browsing my raspberry pi. Thank you @Kashalls for making this pull request. I do have one teensy nitpick, it should be "rco2" not "rc02". Far from critical, but I had to point it out. Thanks to all on this pull request!

Bah, I was holding out on this. I'll fix it tonight. 🤣

geerlingguy commented 2 years ago

And I promise I'll get back to testing and merging this soon. October/November have been a bit crazy for me and one of my sensors has been sitting on my desk waiting patiently for the lobotomy...

kashalls commented 2 years ago

And I promise I'll get back to testing and merging this soon. October/November have been a bit crazy for me and one of my sensors has been sitting on my desk waiting patiently for the lobotomy...

Your good Jeff! Make sure you update your pi setup as well. Any chance we might see an updated video in the future?

geerlingguy commented 2 years ago

@Kashalls - Most likely, yes. I'm hoping to get an Amber (Home Assistant) next year sometime, so at that point I'll likely be comparing this new setup + Prometheus + Grafana to ESPHome and Home Assistant.

Nightreaver commented 2 years ago

I just want mention that the AirGradient lib we use, already has PM10 and PM1 prepared but not fully integrated. I have prepared it in my own Fork ( https://github.com/Nightreaver/arduino/commit/f7f852438977224617018e81b008fe4cbe8def0e ), tho maybe you can get it fully implemeted in the main repo - as I'm not an cpp professional. So in case this gets added, you can add to prometheus as well.

Takalele commented 2 years ago

hi,

@Kashalls unfortunately it does not work when the ssid is hidden (if the ssid is not hidden it works perfectly fine), the old version with the WiFiManager.h works fine with hidden ssids so i think the issue has to do with the WiFiClient.h any ideas?

BR Takalele

kashalls commented 2 years ago

hi,

@Kashalls unfortunately it does not work when the ssid is hidden (if the ssid is not hidden it works perfectly fine), the old version with the WiFiManager.h works fine with hidden ssids so i think the issue has to do with the WiFiClient.h any ideas?

BR Takalele

Let me try it out right now. I'm curious :)

kashalls commented 2 years ago

hi,

@Kashalls unfortunately it does not work when the ssid is hidden (if the ssid is not hidden it works perfectly fine), the old version with the WiFiManager.h works fine with hidden ssids so i think the issue has to do with the WiFiClient.h any ideas?

BR Takalele

Mine is currently connected to my kash-iot network. https://uwu.whats-th.is/9TapJtP.png

Although I do have these settings: https://uwu.whats-th.is/AKhLt2m.png

Takalele commented 2 years ago

@Kashalls thank you for your testing and thanks for the metrics-exporter on the esp itself.

After 4 hours of troubleshooting my wireless network (1xCisco 3702e and 1x Cisco 3602e) the only things I figured out so far is: 1.) It doesn't matter if I use the WiFiManager.h or the WiFiClient.h 2.) If the SSID is hidden, the ESP's wont connect to the Cisco 3702e but the 3602e works fine (but the radios from the 3602e are disabled during night time). There is no configuration differences on the 2,4Ghz radios between the two Cisco AP's. 3.) If the SSID is not hidden, it works perfectly fine.

Do you have any tips/ideas how I can proceed troubleshooting on the esp?

BR Takalele

NeilMartin commented 2 years ago

@Takalele You mentioned 2 days ago that hidden SSID worked with the old version of WifiManager.h, but now after further testing you're saying that's no longer the case? The reason I ask is I'm tempted to get the old connection method working again, so that you don't have to hard code the ssid/password into the code.

kashalls commented 2 years ago

Hmmm that's weird. I guess we can switch back to the older version of wifi client, that's fine. I'll look into it later

Takalele commented 2 years ago

@NeilMartin @Kashalls,

you don't need to switch back, I debugged the issue further and came to the conclusion that the ESP is not the problem. The problem is in the Cisco 3702e AP firmware (153-3.JPK). I tried multiple firmware versions and all older versions are working fine.

Thank you for your help and sorry for the confusion.

BR Takalele

geerlingguy commented 2 years ago

I've cleaned up the README and reorganized the config slightly; I'm testing this out on my sensors tonight since we had a bit of a burned up item in the oven and I noticed the sensors weren't supplying data to my existing Prometheus setup. Perfect time to test out the changes :)

kashalls commented 2 years ago

I've cleaned up the README and reorganized the config slightly; I'm testing this out on my sensors tonight since we had a bit of a burned up item in the oven and I noticed the sensors weren't supplying data to my existing Prometheus setup. Perfect time to test out the changes :)

If something does not end up working or as intended, mention me and I can work on a fix tonight. Its been working wonderfully for me since adding it https://github.com/Kashalls/home-cluster/blob/main/cluster/apps/monitoring/kube-prometheus-stack/helm-release.yaml#L204 .

geerlingguy commented 2 years ago

@Kashalls - Thanks! Almost everything was great out of the box. I've just tweaked a few things now to make sure the display doesn't cut anything off (e.g. TMP instead of ATMP, HUM instead of RHUM, and added a bit to display temp like XX.XC (so it's more apparent—my brain seems to interpret that better as a temperature value than "XX.XX" :)