gcgarner / IOTstack

docker stack for getting started on IOT on the Raspberry PI
GNU General Public License v3.0
1.52k stars 582 forks source link

How to modify influxdb.conf? #113

Closed mlauhalu closed 4 years ago

mlauhalu commented 4 years ago

This is probably me just being uneducated, but I just can't figure out how to modify the Influxdb configuration file (influxdb.conf). The container is running fine, and I have created databases for my purposes, but I suspect that the _internal database is creating unnecessary overhead, and I rarely use any of the data it creates.

Can anybody guide me on how to make the modification in the conf file?

Paraphraser commented 4 years ago

I strongly doubt "education" comes into it. I'm also on a steep learning curve and keep "guessing poorly", as they say.

I have not had to do this yet (at least not for Influx) but I believe I understand the process, which is that you configure Influx by changing this file:

~/IOTstack/services/influxdb/influxdb.env

followed by a restart of the service:

$ cd ~/IOTstack
$ docker-compose restart influxdb

According to the InfluxDB Administration Guide:

ALL of the configuration settings in the configuration file can be specified either in the configuration file or in an environment variable. The environment variable overrides the equivalent option in the configuration file. If a configuration option is not specified in either the configuration file or in an environment variable, InfluxDB uses its internal default configuration.

The "ALL" is my emphasis. If you can do it in the config file, you can do it in the ENV file.

To take an example, if I was planning to set this option in the configuration file inside the container:

reporting-disabled = true

I should also be able to do the same thing in the ENV file:

INFLUXDB_REPORTING_DISABLED=true

You can probably try things out by modifying the configuration file directly:

$ docker exec -it influxdb bash
# cd /etc/influxdb/; ls
influxdb.conf

but, because that file is not mapped into the world outside the container, any changes you make will be lost the next time the container is rebuilt.

Because ENV settings are applied after the config file, you can override anything that already happens to be in the config file, which isn't all that much:

[meta]
  dir = "/var/lib/influxdb/meta"

[data]
  dir = "/var/lib/influxdb/data"
  engine = "tsm1"
  wal-dir = "/var/lib/influxdb/wal"

However, I must admit that I am intrigued by what lies behind your question. My Influx databases currently weigh in at 80MB (in compressed portable backup format). I am running Influx "as is" with all its defaults and retention policies set to never discard anything. One database, two measurements. One measurement gains a row every five minutes (from a solar inverter; first row November 2015) the other gains a row every 10 seconds (from a mains power monitor; first row April 2018). I have Grafana staring at this producing running plots, typically with the timebase set to the last 12 hours (grid voltage, DC power generated by the panels, AC power generated by the inverter) plus a histogram being dynamically updated for the same period classifying voltage into 1-volt-wide buckets. It all just works. Data goes in, data comes out without so much as a passing glitch. Changing the Grafana timebase to anything from the last minute through to the last week repaints the graphs before my finger lifts off the mouse. Asking for the last year or last last two years takes a bit of time but I kind-of expect that. And, sure, when I had all this running on a quad-core MacMini it was a fair bit faster at the more complex queries than the RPi4B, but the RPi is still no slouch.

Other than out-of-the-box RPi4B/4GB, out-of-the-box Raspbian and out-of-the-box IOTstack, the only thing I'm doing that might be unusual is running from an external SSD rather than the SD card. The data-path for data coming in is Arduino->NodeRed flow->Influx and I doubt that's unusual.

I've given chapter and verse on my scenario so that you can say things like, "Oh 80MB is nothing. Just wait until you get to 1GB and then you'll see it slow down" or "sure, using an SSD instead of writing to an SD card easily explains why you're not seeing the same overhead."

mlauhalu commented 4 years ago

This is why I like this community so much. Thank you @Paraphraser for your elaborate response.

So in theory, if I would like to disable the _internal monitoring, could that be done by adding the following line in the ENV file:

INFLUXDB_MONITOR_STORE_ENABLED=false

since it is under the [monitor] section in the influxdb.conf file?

The reason why I am asking, is that yesterday I felt that my RPi3B+ was a bit sluggish, and I read on the Influxdb docs, that it is not recommended to use the _internal database in a production cluster. Now, without even knowing what exactly a "production cluster" refers to, I thought better safe than sorry, since I already corrupted one SD-card in the past, and I suspect it was due to reading/writing/deleting too frequently.

I'm running Home Assistant (hass.io), with InfluxDB for sensor data collection and Grafana for visualization. While running one ready made dashboard for _internal database in Grafana, I noticed that I have some write fails:

image

Is this normal, or should I be concerned? And when you mention @Paraphraser that your database is 80MB, I've been eager to understand that metric myself also. What is the correct way of querying that? At the end of the day, I want have the needed monitoring in place, in order to make sure that I'm not pushing the RPi too hard, and that all sensor data is written to the database without hickups.

Paraphraser commented 4 years ago

At the risk of telling you something you already know (and I sincerely apologise if that's what I'm doing), the impression I got from your sentence "since it is under the [monitor] section" is that you felt you had to infer the "MONITOR" part of the ENV name from the fact that it is in the "[monitor]" section. It's more straightforward than that. The exact name is given to you in the doco:

Screen Shot 2019-12-10 at 21 52 44

My understanding of "cluster" is a group of two or more distinct hosts (computers) running a shared instance of a database - high availability, redundancy, replicating stuff, etc. I could be wrong but don't think that applies here. In fact, I think you have to pay money for the enterprise version to get that.


My understanding of "_internal" has gone as far as "show stats", which produced a ton of gibberish and ... well, I got bored and gave up. I just tried to follow in your footsteps but got nowhere. The first problem was the absence of the Clock module - easily fixed. But, after that, it doesn't seem to matter what I do, I can't get anything to work.

+ > Import. Stick "421" in the field. It looks like it's working but the only way I can get the "Import" button to become active is by changing from "InfluxDB Internal Stats Database" to "InfluxDB". That makes no sense because I reckon I should be able to see the "power" database, as in:

> show databases
name: databases
name
----
_internal
power
test
> 

Regardless of the correct answer, I don't see either "_internal" or "power" anywhere in this dashboard and I have clicked and prodded it every which-way.

Anyway, I chose "InfluxDB" as the only option that would enable the Import button, clicked Import, and then the only thing that works is the clock (which is lying, by the way: the title says "UTC time" but is actually displaying my local time; not that that really matters). Every other panel in the dashboard is "N/A" or "no data to show" or "no data points". I've done it several times (including "421" and downloading/importing the Json), always with the same result. I've given up.

If you can give me some hints on how to make it work, I'll be most appreciative and I'll definitely give it another whirl.

Putting that to one side, I did another "show stats" and went looking for "fail" and "error" but didn't find anything with a non-zero count. Whether that covers the same ground as "Write Fail" in your chart, I don't know but, if it does, I presume it means I'm not getting any errors.

But, again, I suspect that that's because I'm using an external SSD rather than writing to the SD. More precisely, the RPi4 boots from the SD but runs from the SSD (ie the guts of Raspbian, Docker and the containers are all on the SSD). There were three reasons I went down that path:

  1. I had a 400GB USB-3 SSD available and the RPi4 had USB-3 ports doing nothing.
  2. Flash cards always make me nervous in a reliability sense. It might be down to the way I use them but they always seem to be getting corrupted and needing to be re-initialised (or chucked out).
  3. I thought, "I'm starting with a 16GB SD card with no real idea of how big the databases are actually going to get - do I spend $ on a larger capacity card which might go cactus anyway, or do I throw 400GB at the problem and not have to think about it for a while?" The answer was easy.

Even had I been doing all this on my RPi3B+, I would still have gone with external storage and accepted the slower USB-2 interface.


On the "size" issue, I'll give you two answers. The first (which I did not do before) is to ask the container how big the directory holding the database is:

$ docker exec -it influxdb bash
# du -mch /var/lib/influxdb | tail -1
331M    total

The estimate I gave before (80MB) came from doing this:

  1. If you run Graham's backup script (~/IOTstack/scripts/docker_backup.sh) you wind up with a .tar.gz file stored in ~/IOTstack/backups.
  2. I have a cron job that calls Graham's script then "scp" the .tar.gz to one of my Macs.
  3. I double-clicked the latest .tar.gz which unpacks it.
  4. Within the unpacked structure, at the path "./backups/influxdb/db" is a snapshot of the running database when the backup script was run.
  5. The contents of that folder is mainly more .tar.gz. I think each .tar.gz is what the Influx doco refers to as a "shard" but the directory as a whole is what the Influx doco calls "portable" backup format.
  6. It was that directory - as a whole - that I took the size of (81.1MB).

I just unpacked the >200 .tar.gz files to see if they got close to the 331MB cited earlier. The answer was 236MB. Close, but not quite the full explanation.

Does that help?

mlauhalu commented 4 years ago

Actually I had just missed those monitoring settings from the documentation, so thank you for pointing those out.

For the importing of the dashboard template, for me it showed data for some of the panels but not all. The one which showed the chart which I attached in my previous post, has the following queries:

SELECT non_negative_derivative(last("pointsWrittenFail"), 1s) FROM "httpd" WHERE $timeFilter GROUP BY time($interval), "host" fill(null)

SELECT non_negative_derivative(last("pointsWrittenOK"), 1s) FROM "httpd" WHERE $timeFilter GROUP BY time($interval), "host" fill(null)

Thanks for the docker command, it gave me the following:

$ docker exec -it influxdb bash # du -mch /var/lib/influxdb | tail -1 69M total

With all this, I'm still not feeling 100% confident with my SD-card. I might actually start looking into similar solution as you have. I'm currently at 25% usage in total on my 32gb flash drive, while I would prefer having more reliable solution and not even having to worry about the SD-card failing.

Closing this issue since I got the answer to my initial question. Thanks @Paraphraser !

Paraphraser commented 4 years ago

I know you closed this but...

I finally got that pesky dashboard working:

Screen Shot 2019-12-11 at 16 56 29

As you'll see, the veritable definition of a "flatline" for errors. My guess is that the occasional dots below the "Write OK" line are explained by RPi4 reboots.