FreifunkBremen / yanic

Yet another node info collector - for respondd to be used with meshviewer to Grafana (with influxdb or graphite)
https://freifunkbremen.github.io/yanic/
GNU Affero General Public License v3.0
20 stars 40 forks source link

All nodes offline after update #137

Closed mweinelt closed 6 years ago

mweinelt commented 6 years ago

updated today from 003c44fd6cb1b2218f4328aa512f292592fa54bf to 10f58a72ea289d68f36c9c6ccad1d0f6f424ac38 and now all nodes are offline. The meshviewer.json is updated every minute, but the nodes stay offline. Sniffing the interfaces I see querys and answers, but they're silently discarded.

yanic[32066]: 2018/05/08 15:46:02 nodes.go:217: loaded 1086 nodes
yanic[32066]: 2018/05/08 15:46:02 serve.go:53: delaying 58.0 seconds
yanic[32066]: 2018/05/08 15:47:00 collector.go:165: sending multicasts
yanic[32066]: 2018/05/08 15:47:30 collector.go:200: sending 0 unicast pkg for 0 nodes
yanic[32066]: 2018/05/08 15:48:05 database.go:152: saving 82 points
yanic[32066]: 2018/05/08 15:48:30 collector.go:165: sending multicasts
yanic[32066]: 2018/05/08 15:49:00 collector.go:200: sending 0 unicast pkg for 0 nodes
yanic[32066]: 2018/05/08 15:49:05 database.go:152: saving 82 points
yanic[32066]: 2018/05/08 15:49:30 collector.go:165: sending multicasts
yanic[32066]: 2018/05/08 15:50:00 collector.go:200: sending 0 unicast pkg for 0 nodes
# This is the config file for Yanic written in "Tom's Obvious, Minimal Language."
# syntax: https://github.com/toml-lang/toml
# (if you need somethink multiple times, checkout out the [[array of table]] section)

# Send respondd request to update information
[respondd]
enable           = true
# Delay startup until a multiple of the period since zero time
synchronize      = "1m"
# how oftern request per multicast
collect_interval = "1m"

[respondd.sites.ffda]
domains          = ["ffda-64850", "ffda-64853", "ffda-64823", "ffda-64404", "ffda-64319", "ffda-64342", "ffda-64297", "ffda-64665", "ffda-64673", "ffda-64839", "ffda-64859", "ffda-64807", "ffda-64409", "ffda-64846", "ffda-64380", "ffda-64354", "ffda-64401", "ffda-63128", "ffda-63322", "ffda-63500", "ffda-63533", "ffda-63110", "ffda-64407", "ffda-64732", "ffda-64720", "ffda-64739", "ffda-64711", "ffda-64753", "ffda-64750", "ffda-64385", "ffda-64747", "ffda-64395", "ffda-64756", "ffda-64367", "ffda-64372", "ffda-64405", "ffda-64397", "ffda-63225", "ffda-63329", "ffda-63303", "ffda-64832", "ffda-64347", "ffda-64331", "ffda-legacy", "ffda-default", "ffda-da-120", "ffda-da-320", "ffda-da-130", "ffda-da-210", "ffda-da-270", "ffda-da-310", "ffda-da-220-230", "ffda-da-110", "ffda-da-240", "ffda-da-530", "ffda-da-540", "ffda-da-250", "ffda-da-260", "ffda-da-410", "ffda-da-330", "ffda-da-140", "ffda-da-420", "ffda-da-150", "ffda-da-340", "ffda-da-520", "ffda-da-430", "ffda-da-510", "ffda-da-440", "ffda-da-810-820", "ffda-64390", "ffda-da-610-620-630", "ffda-da-910-920", "ffda-64569", "ffda-64572", "ffda-64521", "ffda-64546", "ffda-64560", "ffda-64579", "ffda-64589", "ffda-64584"]

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom12-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom13-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom10-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom11-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom16-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom17-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom14-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom15-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom8-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom9-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom61-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom0-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom1-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom2-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom3-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom4-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom5-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom6-br"

[[respondd.interfaces]]
# name of interface on which this collector is running
ifname           = "dom7-br"

# A little built-in webserver, which statically serves a directory.
# This is useful for testing purposes or for a little standalone installation.
[webserver]
enable  = false
bind    = "127.0.0.1:8080"
webroot = "/var/www/html/meshviewer"

[nodes]
# Cache file
# a json file to cache all data collected directly from respondd
state_path     = "/var/lib/yanic/state/state.json"
# prune data in RAM, cache-file and output json files (i.e. nodes.json)
# that were inactive for longer than
prune_after = "90d"
# Export nodes and graph periodically
save_interval = "5s"
# Set node to offline if not seen within this period
offline_after = "10m"

## [[nodes.output.-]]
# every output:
#  needs to be enabled just adding:
#    enable = true
#  could filter the nodes by using a there filter entry (see output meshviewer)
#   [nodes.output.-.filter]
#  could be used multiple times (suggested by the "[[...]]" instatt of "[...]")
#  it is useful for e.g. filter by different array and use multiple meshviewers

[[nodes.output.meshviewer]]
enable = true
# structur of nodes.json, which to support
# version 1 is to support legacy meshviewer (which are in master branch)
#    i.e. https://github.com/ffnord/meshviewer/tree/master
# version 2 is to support new version of meshviewer (which are in legacy develop branch or newer)
#    i.e. https://github.com/ffnord/meshviewer/tree/dev
#         https://github.com/ffrgb/meshviewer/tree/develop
version  = 2
# path where to store nodes.json
nodes_path     = "/var/lib/yanic/meshviewer/nodes.json"
# path where to store graph.json
graph_path     = "/var/lib/yanic/meshviewer/graph.json"

[nodes.output.meshviewer.filter]
no_owner = true

[[nodes.output.meshviewer-ffrgb]]
enable   = true
path = "/var/lib/yanic/meshviewer/meshviewer.json"

[nodes.output.meshviewer-ffrgb.filter]
no_owner = true
#has_location = true
#blacklist = ["vpnid"]
#sites = ["ffda"]

#replace the site_code with the domain_code in this output
# e.g. site_code='ffhb',domain_code='city' => site_code='city', domain_code=''
domain_as_site = true
#
# append on the site_code the domain_code with a '.' in this output
# e.g. site_code='ffhb',domain_code='city' => site_code='ffhb.city', domain_code=''
#domain_append_site = false
#

#[nodes.output.meshviewer.filter.in_area]
#latitude_min = 34.30
#latitude_max = 71.85
#longitude_min = -24.96
#longitude_max = 39.72

[[nodes.output.nodelist]]
enable   = true
path = "/var/lib/yanic/meshviewer/nodelist.json"

[database]
# cleaning data of measurement node,
#   which are older than 7d
delete_after = "180d"
#   how often run the cleaning
delete_interval = "1h"

## [[database.connection.-]]
# every output:
#  needs to be enabled just adding:
#    enable = true
#  could be used multiple times (suggested by the "[[...]]" instatt of "[...]")
#  it is useful for e.g. save into a database before and behind a firewall

# Save collected data to InfluxDB
# there would be the following measurments:
#  node: store node spezific data i.e. clients memory, airtime
#  global: store global data, i.e. count of clients and nodes
#  firmware: store count of nodes tagged with firmware
#  model: store count of nodes tagged with hardware model
[[database.connection.influxdb]]
enable   = true
address  = "http://ruby.darmstadt.freifunk.net:8086"
database = "yanic"
username = "yanic"
password = "..."

[[database.connection.logging]]
enable   = false
path     = "/var/log/yanic.log"

[[database.connection.graphite]]
enable   = false
address  = "stats.test.h4ck.space:2003"
prefix   = "freifunk"

# respondd (yanic)
# forward collected respondd package to a address
# (e.g. to another respondd collector like a central yanic instance or hopglass)
[[database.connection.respondd]]
enable   = false
# type of network to create a connection
type     = "udp6"
# destination address to connect/send respondd package
address = "stats.darmstadt.freifunk.net:11001"

# Logging
[[database.connection.logging]]
enable   = false
path = "/var/log/yanic.log"
genofire commented 6 years ago

Does it work if you set the ip_address manuelle to the link local address at [[respondd.interfaces]] ?

Grotax commented 6 years ago

Had the same problem, fixed it by setting the ip_address to the link local ipv6

genofire commented 6 years ago

idea, if multicast_address is not set, i will choose current linklocal (for batman) - otherwise it will choose a routeable address (for babel)

mweinelt commented 6 years ago

Added the link local addrs for each interface.

Querying happens, answers appear as well -- no change.

17:08:30.012866 IP6 fe80::d8ff:1ff:fe00:1504.35272 > ff02::2:1001.1001: UDP, length 34
17:08:30.225852 IP6 fe80::d8ff:1ff:fe00:604.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 628
17:08:30.299070 IP6 fe80::d8ff:1ff:fe00:204.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 622
17:08:30.299616 IP6 fe80::d8ff:1ff:fe00:504.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 628
17:08:30.352959 IP6 fe80::d8ff:1ff:fe00:704.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 657
17:08:30.373347 IP6 fe80::d8ff:1ff:fe00:304.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 610
17:08:30.381902 IP6 fe80::d8ff:1ff:fe00:804.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 620
17:08:30.441293 IP6 fe80::d8ff:1ff:fe00:104.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 629
17:08:30.649324 IP6 fe80::d8ff:1ff:fe00:404.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 615
17:08:34.475211 IP6 fe80::928d:78ff:fe23:af1c.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 1039
17:08:36.576803 IP6 fe80::fa1a:67ff:fe5a:6f04.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 1086
17:08:39.709300 IP6 fe80::5054:ff:fed0:83a0.1001 > fe80::d8ff:1ff:fe00:1504.35272: UDP, length 1047
# ss -lpn | grep yanic
udp    UNCONN     0      0      [fe80::d8ff:15ff:fe00:1504]%dom15-br:54396              [::]:*                   users:(("yanic",pid=1483,fd=12))
udp    UNCONN     0      0      [fe80::d8ff:5ff:fe00:1504]%dom5-br:48775              [::]:*                   users:(("yanic",pid=1483,fd=21))
udp    UNCONN     0      0      [fe80::d8ff:11ff:fe00:1504]%dom11-br:34953              [::]:*                   users:(("yanic",pid=1483,fd=8))
udp    UNCONN     0      0      [fe80::d8ff:61ff:fe00:1504]%dom61-br:53391              [::]:*                   users:(("yanic",pid=1483,fd=15))
udp    UNCONN     0      0      [fe80::d8ff:6ff:fe00:1504]%dom6-br:58008              [::]:*                   users:(("yanic",pid=1483,fd=22))
udp    UNCONN     0      0      [fe80::d8ff:4ff:fe00:1504]%dom4-br:46340              [::]:*                   users:(("yanic",pid=1483,fd=20))
udp    UNCONN     0      0      [fe80::d8ff:7ff:fe00:1504]%dom7-br:37641              [::]:*                   users:(("yanic",pid=1483,fd=23))
udp    UNCONN     0      0      [fe80::d8ff:10ff:fe00:1504]%dom10-br:58185              [::]:*                   users:(("yanic",pid=1483,fd=7))
udp    UNCONN     0      0      [fe80::d8ff:13ff:fe00:1504]%dom13-br:38221              [::]:*                   users:(("yanic",pid=1483,fd=6))
udp    UNCONN     0      0      [fe80::d8ff:2ff:fe00:1504]%dom2-br:55224              [::]:*                   users:(("yanic",pid=1483,fd=18))
udp    UNCONN     0      0      [fe80::d8ff:14ff:fe00:1504]%dom14-br:48576              [::]:*                   users:(("yanic",pid=1483,fd=11))
udp    UNCONN     0      0      [fe80::d8ff:1ff:fe00:1504]%dom1-br:35272              [::]:*                   users:(("yanic",pid=1483,fd=17))
udp    UNCONN     0      0      [fe80::d8ff:16ff:fe00:1504]%dom16-br:40478              [::]:*                   users:(("yanic",pid=1483,fd=9))
udp    UNCONN     0      0      [fe80::d8ff:12ff:fe00:1504]%dom12-br:43038              [::]:*                   users:(("yanic",pid=1483,fd=5))
udp    UNCONN     0      0      [fe80::d8ff:3ff:fe00:1504]%dom3-br:55347              [::]:*                   users:(("yanic",pid=1483,fd=19))
udp    UNCONN     0      0      [fe80::d8ff:9ff:fe00:1504]%dom9-br:40508              [::]:*                   users:(("yanic",pid=1483,fd=14))
udp    UNCONN     0      0      [fe80::d8ff:17ff:fe00:1504]%dom17-br:52284              [::]:*                   users:(("yanic",pid=1483,fd=10))
udp    UNCONN     0      0      [fe80::d8ff:8ff:fe00:1504]%dom8-br:44609              [::]:*                   users:(("yanic",pid=1483,fd=13))
udp    UNCONN     0      0      [fe80::d8ff:ff:fe00:1504]%dom0-br:42086              [::]:*                   users:(("yanic",pid=1483,fd=16))
yanic[1483]: 2018/05/08 17:03:21 nodes.go:217: loaded 1086 nodes
yanic[1483]: 2018/05/08 17:03:21 serve.go:53: delaying 38.4 seconds
yanic[1483]: 2018/05/08 17:04:00 collector.go:165: sending multicasts
yanic[1483]: 2018/05/08 17:04:30 collector.go:200: sending 0 unicast pkg for 0 nodes
yanic[1483]: 2018/05/08 17:05:05 database.go:151: saving 82 points
yanic[1483]: 2018/05/08 17:05:30 collector.go:165: sending multicasts
yanic[1483]: 2018/05/08 17:06:00 collector.go:200: sending 0 unicast pkg for 0 nodes
yanic[1483]: 2018/05/08 17:06:05 database.go:151: saving 82 points
yanic[1483]: 2018/05/08 17:06:30 collector.go:165: sending multicasts

This is a batman-adv only at this time.

genofire commented 6 years ago

@mweinelt - that looks good

yanic load 1086 nodes and saved only 82 points - could you take a look how many unicast request was nessasary before updating yanic?

mweinelt commented 6 years ago

I'm saying:

Edit: I think it could be related to the port now being floating.

genofire commented 6 years ago

@mweinelt irc? #meshviewer hackint?

mweinelt commented 6 years ago

The last bit was related to the firewall, my bad.