freifunk-berlin / puppet

Deprecated: New infrastructure development is happening in https://github.com/freifunk-berlin/ansible
Other
2 stars 4 forks source link

outdated nodes on monitor.berlin.freifunk.net #22

Open cholin opened 9 years ago

cholin commented 9 years ago

We have plenty of non-reporting entries for our monitor service. 63 of 268 nodes are outdated for http://monitor.berlin.freifunk.net. I wrote a snippet to detect these nodes (see https://gist.github.com/cholin/3378b204ab7513bc6024 - output: http://paste.debian.net/hidden/2dbd50bb/).

I would suggest we remove these outdated entries automatically via a cronjob. For example after 14 days of inactivity.

booo commented 9 years ago

We may loose some data but I don't care. I think we should increase the inactivity time to a month.

cholin commented 9 years ago

I'm fine with a month. On the mailing list someone proposed 12 month (see archive) but I think that barely helps. Often people misconfigure their node and therefor we have orphaned entries and my goal is to delete these ones. With 12 month we won't archive this goal...

cholin commented 9 years ago

I would configure a cronjob to delete orphaned entries after one month of inactivity if no one complains in the next days.

booo commented 9 years ago

Sounds good. Thanks! I can integrate the script in puppet if you like.

cholin commented 9 years ago

Would be cool!

andrenarchy commented 9 years ago

+1 for 1 month.

egmont1227 commented 9 years ago

+1 for 1 month

but before running the removal live, i suggest to annouce to the mailing list to ask to fix the reporting. I see a lot of known locations which just be misconfigured / not to much cared about the stats but still operational.

cholin commented 9 years ago

Just a short update: Seems I forgot to mention that the cronjob for deleting non-repoting nodes (with inactivity of 30 days) is configured since 2.02.2015. Seems integration into puppet is still missing.

pmelange commented 4 years ago

It seems like this isn't really working for nodes which no longer report a specific statistic. For example, the old ffvpn interface is not longer used by many nodes, but since the node itself still reports for example it's load average, the old ffvpn rrd files is still on the system.

/var/lib/collectd/rrd$ find . -type f -mtime +365 | wc -l
13042

Currently there are over 13000 rrd files which are over a year old. RRD files only store statistics for up to a year, so I think these files should be deleted too.