Juniper / open-nti

Open Network Telemetry Collector build with open source tools
Apache License 2.0
231 stars 93 forks source link

DB is empty #194

Closed door7302 closed 6 months ago

door7302 commented 6 years ago

Hello

Host receives well the streams :

xxxxxxxxx:~$ sudo tcpdump -n -i ens160 udp port 50000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on ens160, link-type EN10MB (Ethernet), capture size 262144 bytes 15:53:35.376488 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 347 15:53:35.376993 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 1003 15:53:35.389436 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 1021 15:53:37.108930 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 449 15:53:37.109874 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 1502 15:53:37.122460 IP 193.251.127.49.1000 > 193.252.147.55.50000: UDP, length 1047

Streams are well forwarded to container open-nti :

xxxxxxxxx:~$ sudo docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 6cdb593ebffa juniper/open-nti-input-jti:latest "/bin/sh -c /home/..." 6 hours ago Up 6 hours 5140/tcp, 24224/tcp, 0.0.0.0:50000->50000/udp, 24284/tcp, 0.0.0.0:50020->50020/udp opennti_input_jti 5eb3f0533a60 opennti_input-internal "/source/start-inp..." 6 hours ago Up 6 hours opennti_input_internal 812964c10395 juniper/open-nti-input-syslog:latest "/bin/sh -c /home/..." 6 hours ago Up 6 hours 5140/tcp, 24220/tcp, 24224/tcp, 0.0.0.0:6000->6000/udp opennti_input_syslog 17ad5ac288ae opennti_input-snmp "/source/start-inp..." 6 hours ago Up 6 hours 0.0.0.0:162->162/udp opennti_input_snmp 01a616d37842 juniper/open-nti:latest "/sbin/my_init" 6 hours ago Up 6 hours 0.0.0.0:80->80/tcp, 0.0.0.0:3000->3000/tcp, 0.0.0.0:8083->8083/tcp, 0.0.0.0:8086->8086/tcp, 0.0.0.0:8125->8125/udp opennti_con

xxxxxxxxx:~$ sudo docker exec -i -t 6cdb593ebffa /bin/bash

bash-4.3$ sudo tcpdump -n -i eth0 udp port 50000 tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 15:55:31.840339 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 264 15:55:31.840705 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 616 15:55:31.843955 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 427 15:55:31.849741 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 656 15:55:31.851842 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 2538 15:55:31.853600 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 352 15:55:31.891920 IP 172.17.0.1.52406 > 172.17.0.6.50000: UDP, length 810 15:55:31.895362 IP 172.17.0.1.52406 > 172.17.0.6.50000: UDP, length 3398 15:55:31.910804 IP 172.17.0.1.52406 > 172.17.0.6.50000: UDP, length 2982 15:55:31.969255 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 2970 15:55:31.972879 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 3350 15:55:31.975477 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 2905 15:55:31.978896 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 407 15:55:32.008557 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 1073 15:55:32.010519 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 2506 15:55:32.014715 IP 172.17.0.1.38599 > 172.17.0.6.50000: UDP, length 476 15:55:32.674373 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, length 547 15:55:32.675256 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, length 1880 15:55:32.689090 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, length 3291 15:55:32.979041 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, length 671 15:55:32.980645 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, bad length 1476 > 1472 15:55:32.996452 IP 172.17.0.1.44608 > 172.17.0.6.50000: UDP, length 2688 15:55:33.321255 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 366 15:55:33.321827 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 1079 15:55:33.323468 IP 172.17.0.1.57931 > 172.17.0.6.50000: UDP, length 379

But unfortunafly DB is empty.

Logs of DB seems fine :

2017-11-26 09:38:58 +0100 [info]: reading config file path="/tmp/fluent.conf" 2017-11-26 09:38:58 +0100 [info]: starting fluentd-0.12.29 2017-11-26 09:38:59 +0100 [info]: gem 'fluent-plugin-juniper-telemetry' version '0.3.0' 2017-11-26 09:38:59 +0100 [info]: gem 'fluentd' version '0.12.29' 2017-11-26 09:38:59 +0100 [info]: adding match pattern="jnpr." type="copy" 2017-11-26 09:38:59 +0100 [info]: adding match pattern="debug." type="stdout" 2017-11-26 09:38:59 +0100 [info]: adding match pattern="fluent.**" type="stdout" 2017-11-26 09:38:59 +0100 [info]: adding source type="forward" 2017-11-26 09:38:59 +0100 [info]: adding source type="udp" 2017-11-26 09:38:59 +0100 [info]: adding source type="udp" 2017-11-26 09:38:59 +0100 [info]: adding source type="monitor_agent" 2017-11-26 09:38:59 +0100 [info]: adding source type="debug_agent" 2017-11-26 09:38:59 +0100 [info]: using configuration file:

@type forward
@id forward_input

@type udp
tag jnpr.jvision
format juniper_jti
port 50000
bind 0.0.0.0

@type udp
tag jnpr.analyticsd
format juniper_analyticsd
port 50020
bind 0.0.0.0

<match jnpr.**> type copy

type influxdb host opennti port 8086 dbname juniper user juniper password xxxxxx value_keys ["value"] buffer_type memory flush_interval 2

@type monitor_agent
@id monitor_agent_input
port 24220

@type debug_agent
@id debug_agent_input
bind 127.0.0.1
port 24230

<match debug.> @type stdout @id stdout_output <match fluent.> @type stdout 2017-11-26 09:38:59 +0100 [info]: listening fluent socket on 0.0.0.0:24224 2017-11-26 09:38:59 +0100 [info]: listening udp socket on 0.0.0.0:50000 2017-11-26 09:38:59 +0100 [info]: listening udp socket on 0.0.0.0:50020 2017-11-26 09:38:59 +0100 [info]: listening dRuby uri="druby://127.0.0.1:24230" object="Engine"

json

Any help is welcome. Thks David

3fr61n commented 6 years ago

Hi David

Thanks for the logs, could you please enable to sent the fluentd output to stdout?

In your docker-compose file just set the flag OUTPUT_STDOUT to true, as follows

root@node1:~/open-nti# cat docker-compose.yml 

input-jti:
  image: $INPUT_JTI_IMAGE_NAME:$IMAGE_TAG
  container_name: $INPUT_JTI_CONTAINER_NAME
  environment:
   - "INFLUXDB_ADDR=opennti"
   - "OUTPUT_INFLUXDB=true"
   - "OUTPUT_STDOUT=true"  <-------
  ports:
   - "$LOCAL_PORT_JTI:50000/udp"
   - "$LOCAL_PORT_ANALYTICSD:50020/udp"
  volumes:
   - /etc/localtime:/etc/localtime
  links:
    - opennti

So after bringing up the container, please check the logs for that container, and you'll see in json format the measurements sent to influxdb.

I would like to check the measurement format/content that's causing the error appears.

Regards

door7302 commented 6 years ago

hello

here the log. trace-opennti.txt

3fr61n commented 6 years ago

So, from your logs it seems there is no error at parsing (I don't see not supported sensors), so the next thing to check is influxdb.

1.- Are you using other collectors like netconf/snmp, etc? in order to check if they are properly writing into the database.

2.- Check influxdb logs from inside the opennti container.... execute

3.- Check if influxdb basic connectivity

juniper@rspjmco1-02:~/scripts/open-nti$ curl -sl -I  http://localhost:8086/ping
HTTP/1.1 204 No Content
Content-Type: application/json
Request-Id: bd7309f5-d434-11e7-8427-000000000000
X-Influxdb-Version: 1.2.0
Date: Tue, 28 Nov 2017 12:07:43 GMT

juniper@rspjmco1-02:~/scripts/open-nti$ 

Regards Efrain

door7302 commented 6 years ago

Thank you for your quick feedback. Here the output of the requested URL :

/opt/open-nti$ curl -sl -I http://localhost:8086/ping HTTP/1.1 204 No Content Content-Type: application/json Request-Id: cc02f766-d37d-11e7-8068-000000000000 X-Influxdb-Version: 1.2.0 Date: Mon, 27 Nov 2017 14:18:10 GMT

And the influxdb log attached : influxdb.log

3fr61n commented 6 years ago

Hi

Each time the following log appear it means there was a data insertion ('write'), in this case in the juniper db.

[httpd] 172.17.0.5 - juniper [27/Nov/2017:15:15:55 +0100] "POST /write?db=juniper&p=%5BREDACTED%5D&precision=s&u=juniper HTTP/1.1" 204 0 "-" "Ruby" 7bbf2946-d37d-11e7-8021-000000000000 24849

And this log is a juniper db creation....

[httpd] 127.0.0.1 - juniper [27/Nov/2017:15:15:12 +0100] "POST /query?chunked=true&db=&epoch=ns&q=create+database+juniper HTTP/1.1" 200 67 "-" "InfluxDBShell/1.2.0" 61c48f75-d37d-11e7-8006-000000000000 3940

or telegraf db

[httpd] 127.0.0.1 - - [27/Nov/2017:15:15:09 +0100] "POST /query?db=&q=CREATE+DATABASE+%22telegraf%22 HTTP/1.1" 200 62 "-" "InfluxDBClient" 5feecc08-d37d-11e7-8001-000000000000 2171

Based on this logs it seems that the db should have some values....

Could you please access through influx cli and do basic queries like....

Regards Efrain

door7302 commented 6 years ago

hereafter the outputs :

root@574155367ce1:/# influx -database juniper -execute 'show measurements' name: measurements name

jnpr.jvision

root@574155367ce1:/# influx -database docker_internal -execute 'show measurements' name: measurements name

docker docker_container_blkio docker_container_cpu docker_container_mem docker_container_net

root@574155367ce1:/# influx -database telegraf -execute 'show measurements' root@574155367ce1:/#

Thanks David

3fr61n commented 6 years ago

Could you please execute also

root@1cea636c464a:/# influx -database juniper -execute 'select * from "jnpr.jvision" limit 10'
name: jnpr.jvision
time                device        filter_name             filter_timestamp type              value
----                ------        -----------             ---------------- ----              -----
1511883109000000000 R1:172.16.0.1 __default_arp_policer__ 1511536072       memory_usage.HEAP 1576
1511883109000000000 R1:172.16.0.1 test                    1511800028       memory_usage.HEAP 1688
1511883109000000000 R1:172.16.0.1 __default_bpdu_filter__ 1511800027       memory_usage.HEAP 2468
1511883119000000000 R1:172.16.0.1 test                    1511800028       memory_usage.HEAP 1688
1511883119000000000 R1:172.16.0.1 __default_bpdu_filter__ 1511800027       memory_usage.HEAP 2468
1511883119000000000 R1:172.16.0.1 __default_arp_policer__ 1511536072       memory_usage.HEAP 1576
door7302 commented 6 years ago

here the output

root@574155367ce1:/# influx -database juniper -execute 'select * from "jnpr.jvision" limit 10' name: jnpr.jvision time device egress_queue filter_counter_name filter_name filter_timestamp interface interface_parent type value


1511792113000000000 ncbor101:193.251.127.153 0 et-10/1/0 ae40 egress_queue_info.rl_drop_bytes 0 1511792113000000000 ncbor101:193.251.127.153 0 xe-10/0/0 ae111 egress_queue_info.peak_buffer_occupancy 65986 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/5 ae15 egress_queue_info.bytes 2098794628 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/5 ae15 egress_queue_info.avg_buffer_occupancy 0 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/5 ae15 egress_queue_info.allocated_buffer_size 6389760 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/4 ae14 egress_queue_info.tail_drop_packets 0 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/4 ae14 egress_queue_info.rl_drop_packets 0 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/4 ae14 egress_queue_info.rl_drop_bytes 0 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/4 ae14 egress_queue_info.red_drop_packets 0 1511792113000000000 ncbor101:193.251.127.153 3 xe-8/1/4 ae14 egress_queue_info.red_drop_bytes 0

door7302 commented 6 years ago

Moreover on Grafana, I've got an error when ALL interfaces is selected. But I think it's normal due to the number of Interfaces I have ? Other point, I didn't see my "ae" interfaces in the list, do you know why ?

grafana

3fr61n commented 6 years ago

So the good part is that your BD is NOT empty :), not sure if there was a problem there.

Now let's focus on grafana.

1.- How many interfaces do you have? (physical and aggregated)... 2.- In your previous output you can see that there is a field called ' interface_parent ' so all physical interfaces that belong to a ae, should have that field with a value. If you want to include aeX interfaces on your templating I suggest to edit the query or add a new templating variable with only aeX interfaces.

door7302 commented 6 years ago

Hello

I have around 1000 interfaces (physical + logical) per router.

I have 4 routers with telemetry enabled

door7302 commented 6 years ago

hello. Any news ?

cc. @3fr61n

3fr61n commented 6 years ago

It's weird, I have templating variable in production with +700 entries and I did not have that kind of errors..

Could you please check if your code contains this fix?

https://github.com/Juniper/open-nti/commit/bff2ea100c2a400665db717d9bb88ab669884897

Regards Efrain

3fr61n commented 6 years ago

Does it ring the bell ?

https://github.com/grafana/grafana/issues/8117

door7302 commented 6 years ago

First I have already have the right value there :

max-row-limit = 0

Regarding the second point : Does it mean I have to modify the datasource.js directly ? If yes, where is located the file ?

Thks

3fr61n commented 6 years ago

TBH I don't know, but it seems there is a hardcoded limit on grafana somewhere.

Not sure which datasource.js are you talking about.

Another workaround is trying to reduce the templating query in order get less than 1000 entries.

For example..... one approach could be to use multiple templating queries that are related in a hierarchy, like

So at then end the Fourth template (that I asume is the one with 1000+ entries), will be narrowed down.

You have a discussion about it here https://github.com/grafana/grafana/issues/2214

door7302 commented 6 years ago

Awesome... It works now with only one interface. So my last question will be : how to create a template for only 'ae' interfaces ? - currently I saw only physical and logical