Closed WhistMo closed 6 years ago
Hi @WhistMo
A few questions,
1.- Are you using other sensors, are they working ok? 2.- Are you using the netconf collector, is this working ok? 3.- In your upgrade did you kept the influxDB? or is a fresh install?, Could you please try to delete the influxdb (via influxdb web ui) then create it again?
Regards Efrain
Hello Efrain,
Please see below answers: 1: The other sensors used to work. But after I upgrade to the latest open-nti, it become very unstable. For example, if I add some new sensors in the routers, the GUI will stop to show any more data. I have to restart the open-nti services to make it work again.
2: The netconf collector is working. It defines a corn job to run several command to pull the data from the routers and decodes it. It works as before.
3: Delete the influxdb database doesn't help. (Also I don't start and stop the open-nti persistently, so I think those data will be deleted after I stop open-nti)
Looking forward to your feedback and help!
Many thanks
Hi WhistMo,
2: Does the RPM parser works for you? I tried the latest and the rpm parser does not work for me.
thx Al
Hi Guys,
Sorry about my late answer, I was trying to reproduce the issue and so far I having issues with 'docker' itself, I mean somehow sensor information is not reaching the jti container.
So could you please trying to do the following task in order to see if we are facing the same issue?
1.- Open a terminal inside the jti container
docker exec -i -t opennti_input_jti /bin/bash
2.- Capture jti packets using tcpdump
sudo tcpdump -i eth0 -nn -v host
3.- Wait and see if jti any packets arrives
4.- Repeat Step 2/3 for your host (where containers are running)
The main idea is to check if docker port forwarding is working properly, somehow in my setup no packets arrives or nat is not properly done.
PS: If packets arrives, next step is enable debugging at fluentd
regards Efrain
Hi Efrain,
Yes, the packets are arrived. bash-4.3$ sudo tcpdump -i eth0 -nn -v host 172.20.98.234 tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes 12:10:38.312445 IP (tos 0xc0, ttl 254, id 8969, offset 0, flags [+], proto UDP (17), length 1500) 172.20.98.234.1000 > 172.17.0.4.50000: UDP, bad length 3199 > 1472 12:10:38.312447 IP (tos 0xc0, ttl 254, id 8969, offset 1480, flags [+], proto UDP (17), length 1500) 172.20.98.234 > 172.17.0.4: ip-proto-17 12:10:38.312449 IP (tos 0xc0, ttl 254, id 8969, offset 2960, flags [none], proto UDP (17), length 267) 172.20.98.234 > 172.17.0.4: ip-proto-17 12:10:38.315608 IP (tos 0xc0, ttl 254, id 8970, offset 0, flags [+], proto UDP (17), length 1500) 172.20.98.234.1000 > 172.17.0.4.50000: UDP, bad length 3199 > 1472 12:10:38.315610 IP (tos 0xc0, ttl 254, id 8970, offset 1480, flags [+], proto UDP (17), length 1500) 172.20.98.234 > 172.17.0.4: ip-proto-17 12:10:38.315612 IP (tos 0xc0, ttl 254, id 8970, offset 2960, flags [none], proto UDP (17), length 267) 172.20.98.234 > 172.17.0.4: ip-proto-17
Could you please tell me how to debug at fluentd?
Many thanks.
1.- Modify docker-compose.yml file you're using so, change the flag OUTPUT_STDOUT from false to true,
....
input-jti:
image: $INPUT_JTI_IMAGE_NAME:$IMAGE_TAG
container_name: $INPUT_JTI_CONTAINER_NAME
environment:
- "INFLUXDB_ADDR=opennti"
- "OUTPUT_INFLUXDB=true"
- "OUTPUT_STDOUT=true" <---------------
ports:
- "$LOCAL_PORT_JTI:50000/udp"
- "$LOCAL_PORT_ANALYTICSD:50020/udp"
volumes:
- /etc/localtime:/etc/localtime
links:
- opennti
.....
2.- Rebuild the container
juniper@naraku:~/scripts/open-nti$ make build
3.- Restart the container(s)
juniper@naraku:~/scripts/open-nti$ make start
echo "Use docker compose file : docker-compose.yml"
Use docker compose file : docker-compose.yml
IMAGE_TAG=consul docker-compose -f docker-compose.yml up -d
Starting opennti_con
Starting opennti_input_syslog
Starting opennti_input_jti
juniper@naraku:~/scripts/open-nti$
4.- Check logs from container console.. and wait for sensors logs.
juniper@naraku:~/scripts/open-nti$ docker logs opennti_input_jti -f
...
2017-02-07 12:23:36 +0100 [info]: reading config file path="/tmp/fluent.conf"
2017-02-07 12:23:36 +0100 [info]: starting fluentd-0.12.29
2017-02-07 12:23:36 +0100 [info]: gem 'fluent-plugin-juniper-telemetry' version '0.3.0'
2017-02-07 12:23:36 +0100 [info]: gem 'fluentd' version '0.12.29'
2017-02-07 12:23:36 +0100 [info]: adding match pattern="jnpr.**" type="copy"
2017-02-07 12:23:36 +0100 [info]: adding match pattern="debug.**" type="stdout"
2017-02-07 12:23:36 +0100 [info]: adding match pattern="fluent.**" type="stdout"
2017-02-07 12:23:36 +0100 [info]: adding source type="forward"
2017-02-07 12:23:36 +0100 [info]: adding source type="udp"
2017-02-07 12:23:36 +0100 [info]: adding source type="udp"
2017-02-07 12:23:36 +0100 [info]: adding source type="monitor_agent"
2017-02-07 12:23:36 +0100 [info]: adding source type="debug_agent"
2017-02-07 12:23:37 +0100 [info]: using configuration file: <ROOT>
<source>
@type forward
@id forward_input
</source>
<source>
@type udp
tag jnpr.jvision
format juniper_jti
port 50000
bind 0.0.0.0
</source>
<source>
@type udp
tag jnpr.analyticsd
format juniper_analyticsd
port 50020
bind 0.0.0.0
</source>
<match jnpr.**>
type copy
<store>
@type stdout
@id stdout_output
</store>
<store>
type influxdb
host opennti
port 8086
dbname juniper
user juniper
password xxxxxx
value_keys ["value"]
buffer_type memory
flush_interval 2
</store>
</match>
<source>
@type monitor_agent
@id monitor_agent_input
port 24220
</source>
<source>
@type debug_agent
@id debug_agent_input
bind 127.0.0.1
port 24230
</source>
<match debug.**>
@type stdout
@id stdout_output
</match>
<match fluent.**>
@type stdout
</match>
</ROOT>
2017-02-07 12:23:37 +0100 [info]: listening fluent socket on 0.0.0.0:24224
2017-02-07 12:23:37 +0100 [info]: listening udp socket on 0.0.0.0:50000
2017-02-07 12:23:37 +0100 [info]: listening udp socket on 0.0.0.0:50020
2017-02-07 12:23:37 +0100 [info]: listening dRuby uri="druby://127.0.0.1:24230" object="Engine"
5.- Sensor logs looks like
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.rl_drop_bytes","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.red_drop_packets","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.red_drop_bytes","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.avg_buffer_occupancy","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.cur_buffer_occupancy","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.peak_buffer_occupancy","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":6,"type":"egress_queue_info.allocated_buffer_size","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.packets","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.bytes","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.tail_drop_packets","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.rl_drop_packets","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.rl_drop_bytes","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.red_drop_packets","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.red_drop_bytes","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.avg_buffer_occupancy","value":0}
2017-02-02 18:03:25 +0100 jnpr.jvision: {"device":"R1:1.1.1.1","interface":"ge-0/0/8","egress_queue":7,"type":"egress_queue_info.cur_buffer_occupancy","value":0}
Hi Efrain,
Thanks for the support. Below is the output, non of the expected log was seen.
BTW, it seems the latest open-nti is very strange, most of the function doesn't work for me(I also have issue #129 ). The only workaround for me to make open-nti (including the rpm probe passer) work again is use the old open-nti. And I can't use it directly (just start and stop the old open-nti won't work), I need to start the latest one first, stop it and start the old open-nti to make it work. After I stop the old open-nti. In order to make it work again, I need to start the latest one, stop it, and start the old one. -_- very very strange. Try on two server, same problem. Tcpdump show the packet have arrive the server.
But with this workaround, the sub-interface doesn't work. -_-
2017-02-07 11:36:34 -0800 [info]: using configuration file:
Hi @WhistMo
Thanks for doing this troubleshooting with us, I have a few questions.
1.- Which release/branch are you using? 2.- Regarding #129 do you have the problem with all commands or just with rpm related measures? 3.- In the logs you mentioned I did not see sensor (jti) packets arriving to the jti containers, so it seems they are dropped somewhere 4.- Can we have a live troubleshooting session?
Hi @WhistMo
Could you try to test this branch and give any feedback?
Hi Efrain,
Thank you very much for your effort.
The debugging-jti is working (At lease sometime, but still very unstable). The logical interfaces are working on this release. But it may have some other issue, like after modify the sensor in the router(switch from logical interfaces to interface or vice versa), it will stop working. need to stop and restart the open-nti. Also not every time is working and it may stop working very quickly.
Yes, live troubleshooting session is welcome. Just ping me when you are free.
Many thanks!
Hi @WhistMo
I sent you yesterday an email, regarding the troubleshooting session, meanwhile when you've mentioned 'latest' version do you really mean 'consul branch'?
Regards
Upgrade the open-nti with the latest one. But the logical interface sensor doesn't work any more. Before it use to have GUI output, but with the latest open-nti, no data show in Grafana GUI. tcpdump show the packets from the routers are reach to the open-nti server. Change the sensor to the interface level, the Grafana GUI will have data.
Try methods in Issue #106 No entries in influxdb, it seems it doesn't work as there is no devel Tag, (only master and consul), try to change it to consul in the Makefile, the logical interfaces doesn't work too.
Software version: 16.1R2.11 (logical interfaces usage used to work with older version of open-nti) 16.2R1.6 (latest version of Junos software, but no luck)
Please help me out.
Many thanks