Juniper / open-nti

Open Network Telemetry Collector build with open source tools
Apache License 2.0
231 stars 93 forks source link

InfluxDB Output Error: Response Error: Status Code [503], expected [204], [<nil>] #197

Closed mnanduri closed 6 years ago

mnanduri commented 6 years ago

removed

3fr61n commented 6 years ago

Hi

This problem happens with any sensor? or just with a specific one?

Regards

mnanduri commented 6 years ago

removed

3fr61n commented 6 years ago

Could you please avoid to use /junos/system/cpu/memory (choose another one, like interfaces or firewall) and test again?

mnanduri commented 6 years ago

removed

3fr61n commented 6 years ago

Questions...

1.- is your influxdb ok? do you have any other input plugin working properly 2.- Could you please attach your input-oc configuration file (the telegraf config)

mnanduri commented 6 years ago

Yes, I can see streaming data and jnpr.jvision in the database. I think earlier email had it, but will send it again.

Sent from my Windows 10 phone

From: Efrain Sent: Friday, December 22, 2017 9:43 AM To: Juniper/open-nti Cc: mnanduri; Author Subject: Re: [Juniper/open-nti] InfluxDB Output Error: Response Error: StatusCode [503], expected [204], [] (#197)

Questions... 1.- is your influxdb ok? do you have any other input plugin working properly 2.- Could you please attach your input-oc configuration file (the telegraf config) — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or mute the thread.

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

Hi @mnanduri

Do you know if there is a proxy somewhere in your setup ? 503 is service unavailable ...

Regards

Pablo

mnanduri commented 6 years ago

Yes, but proxy is being used for outside connectivity. I have parsers and streaming working on this server, the only thing not working is OC.

Cheers, -Mohan

On Tue, Dec 26, 2017 at 10:26 AM, psagrera notifications@github.com wrote:

Hi @mnanduri https://github.com/mnanduri

Do you know if there is a proxy somewhere in your setup ? 503 is service unavailable ...

Regards

Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juniper/open-nti/issues/197#issuecomment-353980055, or mute the thread https://github.com/notifications/unsubscribe-auth/AcT0O8pAgQDVSPwZ61u7SLCLXgNPXbAvks5tERAPgaJpZM4RJWkn .

psagrera commented 6 years ago

Ok

It would be possible to by-pass proxy for discarding that ? I know some issues between grpc and ngnix (cannot be proxied)

Regards

Enviado desde mi iPhone

El 26 dic 2017, a las 16:50, mnanduri notifications@github.com escribió:

Yes, but proxy is being used for outside connectivity. I have parsers and streaming working on this server, the only thing not working is OC.

Cheers, -Mohan

On Tue, Dec 26, 2017 at 10:26 AM, psagrera notifications@github.com wrote:

Hi @mnanduri https://github.com/mnanduri

Do you know if there is a proxy somewhere in your setup ? 503 is service unavailable ...

Regards

Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juniper/open-nti/issues/197#issuecomment-353980055, or mute the thread https://github.com/notifications/unsubscribe-auth/AcT0O8pAgQDVSPwZ61u7SLCLXgNPXbAvks5tERAPgaJpZM4RJWkn .

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

mnanduri commented 6 years ago

removed

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

Could you try the following commands from inside input-oc container ?

1) curl -sL -I opennti:8086/ping --connect-timeout 5 -vvv 2) netstat -tunp | grep 8086

Regards Pablo

psagrera commented 6 years ago

Hi @mnanduri ,

I've replicated your issue using a proxy between input-oc and influxdb database

My setup : input-oc (.3) --- proxy (.2) --- influxdb (.4)
Telegraf config file pointing to proxy (.2): urls = ["http://172.17.0.2:8086"]

dummy proxy (nginx) with the following config:

user root; worker_processes auto; events { worker_connections 1024; } http { upstream opennti { server 172.17.0.3:8086; } server { listen 8086; return 503; }
} and here the result :

2017-12-27T15:13:21Z I! Starting Telegraf v1.5.0~2d3856f 2017-12-27T15:13:21Z I! Loaded outputs: influxdb 2017-12-27T15:13:21Z I! Loaded inputs: inputs.jti_openconfig_telemetry 2017-12-27T15:13:21Z I! Tags enabled: host=d462805b75b5 2017-12-27T15:13:21Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"d462805b75b5", Flush Interval:5s 2017-12-27T15:13:21Z I! Started JTI OpenConfig Telemetry plugin 2017-12-27T15:13:35Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:35Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:40Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:40Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:45Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:45Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:50Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], []

If I configure proxy like that :

user root; worker_processes auto; events { worker_connections 1024; } http { upstream opennti { server 172.17.0.3:8086; } server { listen 8086; location / { proxy_pass http://opennti; } } }

everything works as expected

See below tcpdump excerpt :

Proxy side:

14:32:20.301635 IP 172.17.0.4.51392 > 172.17.0.2.8086: Flags [.], seq 4096:11336, ack 1, win 689, options [nop,nop,TS val 303516228 ecr 303514979], length 7240

Influxdb side :

06:32:20.303122 IP 172.17.0.2.49208 > 172.17.0.3.8086: Flags [.], seq 1:7241, ack 1, win 229, options [nop,nop,TS val 303516228 ecr 303516228], length 7240

and data is inserted properly into the database.

Regards

Pablo

mnanduri commented 6 years ago

Ah very cool, thanks for reproducing it. When i try to add the lines to plugins/input-oc/telegraf.tmpl, the -oc container is not starting. I am not adding either at the right place or correct syntax. Do you mind sharing your file?

On Wed, Dec 27, 2017 at 10:25 AM, psagrera notifications@github.com wrote:

Hi @mnanduri https://github.com/mnanduri ,

I've replicated your issue using a proxy between input-oc and influxdb database

My setup : input-oc (.3) --- proxy (.2) --- influxdb (.4) Telegraf config file pointing to proxy (.2): urls = ["http://172.17.0.2:8086"]

dummy proxy (nginx) with the following config:

user root; worker_processes auto; events { worker_connections 1024; } http { upstream opennti { server 172.17.0.3:8086; } server { listen 8086; return 503; } } and here the result :

2017-12-27T15:13:21Z I! Starting Telegraf v1.5.0~2d3856f 2017-12-27T15:13:21Z I! Loaded outputs: influxdb 2017-12-27T15:13:21Z I! Loaded inputs: inputs.jti_openconfig_telemetry 2017-12-27T15:13:21Z I! Tags enabled: host=d462805b75b5 2017-12-27T15:13:21Z I! Agent Config: Interval:10s, Quiet:false, Hostname:"d462805b75b5", Flush Interval:5s 2017-12-27T15:13:21Z I! Started JTI OpenConfig Telemetry plugin 2017-12-27T15:13:35Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:35Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:40Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:40Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:45Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], [] 2017-12-27T15:13:45Z E! Error writing to output [influxdb]: Could not write to any InfluxDB server in cluster 2017-12-27T15:13:50Z E! InfluxDB Output Error: Response Error: Status Code [503], expected [204], []

If I configure proxy like that :

user root; worker_processes auto; events { worker_connections 1024; } http { upstream opennti { server 172.17.0.3:8086; } server { listen 8086; location / { proxy_pass http://opennti; } } }

everything works as expected

See below tcpdump excerpt :

Proxy side:

14:32:20.301635 IP 172.17.0.4.51392 > 172.17.0.2.8086: Flags [.], seq 4096:11336, ack 1, win 689, options [nop,nop,TS val 303516228 ecr 303514979], length 7240

Influxdb side :

06:32:20.303122 IP 172.17.0.2.49208 > 172.17.0.3.8086: Flags [.], seq 1:7241, ack 1, win 229, options [nop,nop,TS val 303516228 ecr 303516228], length 7240

and data is inserted properly into the database.

Regards

Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juniper/open-nti/issues/197#issuecomment-354129239, or mute the thread https://github.com/notifications/unsubscribe-auth/AcT0O2xnm52YV8iCMxswne_497s7uPepks5tEmF6gaJpZM4RJWkn .

psagrera commented 6 years ago

What lines are you trying to add ? Would you mind to paste or share your telegraf.tmpl ?

Regards

mnanduri commented 6 years ago

root@lvstelemetry-1609572:/home/open-nti# docker inspect -f '{{.Name}} - {{range .NetworkSettings.Networks}}{{.IPAddress}}{{end}}' $(docker ps -aq)

/opennti_input_syslog - 172.17.0.7

/opennti_input_snmp - 172.17.0.6

/opennti_input_internal - 172.17.0.5

/opennti_input_jti - 172.17.0.4

/opennti_input_oc - 172.17.0.3

/opennti_con - 172.17.0.2

Attached is the file. When I add the lines -oc is not starting. I dont think I am having the correct syntax/location.

On Thu, Dec 28, 2017 at 2:14 AM, psagrera notifications@github.com wrote:

What lines are you trying to add ? Would you mind to paste or share your telegraf.tmpl ?

Regards

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juniper/open-nti/issues/197#issuecomment-354241567, or mute the thread https://github.com/notifications/unsubscribe-auth/AcT0OxZ2uOdIsBLNxg1Uh4ATmIzq-Xhoks5tEz_FgaJpZM4RJWkn .

Telegraf Configuration

#

Telegraf is entirely plugin driven. All metrics are gathered from the

declared inputs, and sent to the declared outputs.

#

Plugins must be declared in here to be active.

To deactivate a plugin, comment out the name and any variables.

#

Use 'telegraf -config telegraf.conf -test' to see what metrics a config

file would generate.

#

Environment variables can be used anywhere in this config file, simply prepend

them with $. For strings the variable must be within quotes (ie, "$STR_VAR"),

for numbers and booleans they should be plain (ie, $INT_VAR, $BOOL_VAR)

Global tags can be specified here in key="value" format.

[global_tags]

dc = "us-east-1" # will tag all metrics with dc=us-east-1

rack = "1a"

Environment variables can be used as tags, and throughout the config file

user = "$USER"

Configuration for telegraf agent

[agent]

Default data collection interval for all inputs

interval = "10s"

Rounds collection interval to 'interval'

ie, if interval="10s" then always collect on :00, :10, :20, etc.

round_interval = true

Telegraf will send metrics to outputs in batches of at

most metric_batch_size metrics.

metric_batch_size = 1000

For failed writes, telegraf will cache metric_buffer_limit metrics for each

output, and will flush this buffer on a successful write. Oldest metrics

are dropped first when this buffer fills.

metric_buffer_limit = 10000

Collection jitter is used to jitter the collection by a random amount.

Each plugin will sleep for a random time within jitter before collecting.

This can be used to avoid many plugins querying things like sysfs at the

same time, which can have a measurable effect on the system.

collection_jitter = "0s"

Default flushing interval for all outputs. You shouldn't set this below

interval. Maximum flush_interval will be flush_interval + flush_jitter

flush_interval = "5s"

Jitter the flush interval by a random amount. This is primarily to avoid

large write spikes for users running a large number of telegraf instances.

ie, a jitter of 5s and interval 10s means flushes will happen every 10-15s

flush_jitter = "0s"

Run telegraf in debug mode

debug = false

Run telegraf in quiet mode

quiet = false

Override default hostname, if empty use os.Hostname()

hostname = ""

If set to true, do no set the "host" tag in the telegraf agent.

omit_hostname = false

###############################################################################

OUTPUT PLUGINS

###############################################################################

Configuration for influxdb server to send metrics to

[[outputs.influxdb]]

The full HTTP or UDP endpoint URL for your InfluxDB instance.

Multiple urls can be specified as part of the same cluster,

this means that only ONE of the urls will be written to each interval.

urls = ["udp://localhost:8089"] # UDP endpoint example

urls = ["http://opennti:8086"]

urls = ["http://172.17.0.3:8086"]

[http { upstream opennti { server 172.17.0.3:8086; } server { listen 8086; location / { proxy_pass http://opennti; } } } ]

The target database for metrics (telegraf will create it if not exists).

database = "juniper" # required

Precision of writes, valid values are "ns", "us" (or "µs"), "ms", "s", "m", "h".

note: using "s" precision greatly improves InfluxDB compression.

precision = "s"

Retention policy to write to.

retention_policy = "default"

Write consistency (clusters only), can be: "any", "one", "quorom", "all"

write_consistency = "any"

Write timeout (for the InfluxDB client), formatted as a string.

If not provided, will default to 5s. 0s means no timeout (not recommended).

timeout = "5s" username = "juniper" password = "juniper"

Set the user agent for HTTP POSTs (can be useful for log differentiation)

user_agent = "telegraf"

Set UDP payload size, defaults to InfluxDB UDP Client default (512 bytes)

udp_payload = 512

###############################################################################

INPUT PLUGINS

###############################################################################

Read OpenConfig Telemetry from listed sensors

[[inputs.jti_openconfig_telemetry]]

server = "10.255.32.154:50051"

Frequency to get data in millisecond

sampleFrequency = 15000

Sensors to subscribe for

A identifier for each sensor can be provided in path by separating with space

Else sensor path will be used as identifier

sensors = [ "/junos/system/linecard/interface/", ]

Each data format has it's own unique set of configuration options, read

more about them here:

https://github.com/influxdata/telegraf/blob/master/docs/DATA_FORMATS_INPUT.md

data_format = "influx"

Not working on Dec 1st 2016

"oc-bgp-neighbors /bgp/neighbors/neighbor/",

"/junos/system/linecard/interface/",

"/junos/system/cpu/memory/",

"/junos/system/linecard/packet/usage/",

"/junos/system/linecard/interface/",

"/interfaces/interface/subinterfaces/",

"/junos/system/linecard/optics/",

"/junos/system/linecard/cpu/memory/",

"/junos/system/linecard/services/inline-jflow/",

"/components/",

"/interfaces/interface[name='fxp0']/",

"/interfaces/interface[name='em0']/",

"/interfaces/interface[name='em1']/",

"/interfaces/interface[name='em0']/",

"/interfaces/interface[name='ixlv0']/",

"/interfaces/interface[name='ixlv1']/",

"/interfaces/interface[name='em0']/",

"/interfaces/interface[name='ixgbe0']/",

"/interfaces/interface[name='ixgbe1’]/",

"/junos/rsvp-interface-information/",

"/bgp/neighbors/neighbor/",

"/junos/task-memory-information/",

"/arp-information/",

"/nd6-information/",

"/ipv6-ra/",

"/lldp/",

"/lacp/",

"/junos/system/linecard/fabric/"

"oc-interfaces /interfaces/interface/",

"oc-components /components/",

]

root@lvstelemetry-1609572:/home/open-nti#

3fr61n commented 6 years ago

Hi @mnanduri

Questions

1.- in your host is there any proxy configured to access internet? 2.- Could you please access a running container (i.e. input-oc) and then do ping opennti, in order to check if it resolve the name?

Regards

psagrera commented 6 years ago

Hi @mnanduri

your input-oc container is not starting because you are using proxy file syntax inside telegraf file. Please, remove that part from telegraf.tpl. What I wanted to show you is that if traffic is diverted to a proxy for some reason (one could be the one that Efrain mentioned in the previous comment ) then you will get that error "503 service unavailable". To confirm that theory , I put a proxy (nginx) in my setup to replicate 503 error and I was able to reproduce it.

As Efrain mentioned , I would issue a ping to opennti from inside input-oc container to see if resolve that name. If you have access to your proxy , sniff traffic (via tcpdump) in order to see if those packets are entering there. (if possible paste here the results)

Another command that you can issue from inside your container is:

curl -sL -I opennti:8086/ping --connect-timeout 5 -vvv

and

curl -sL -I 'ip_address_of_influxdb':8086/ping --connect-timeout 5 -vvv

and paste here the result

Regards.

mnanduri commented 6 years ago

removed

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

Hi @mnanduri

It's clear that your problem is the proxy. You need to bypass it for opennti in your hosts/server.

Some examples for linux systems:

Ubuntu:

sudo vi /etc/environment http_proxy="http://proxy.com:8000" no_proxy="127.0.0.1, localhost, *.cnn.com, 192.168.1.10, domain.com:8080"

Centos:

sudo vi /etc/profile.d/proxy.sh

export http_proxy="http://proxy.com:8000" export no_proxy="127.0.0.1, localhost, *.cnn.com, 192.168.1.10, domain.com:8080"

In your case you need to add/include opennti:8086 into the "no_proxy" line

Regards

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

Could you add 172.17.0.2:8086 and try again please ? and paste here result of curl command

Regards

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

Add 172.17.0.2 (without ports) please

psagrera commented 6 years ago

If doesn't work, add inside input-oc container the following line:

root@14afb35c53b7:/source# export no_proxy=172.17.0.2

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

HTTP/1.1 204 No Content means it's OK (i.e by-passing proxy). Could you check if data is being inserted into the database ?

1) Log into opennti_con

docker exec -it opennti_con /bin/bash

2) set terminal settings

stty rows 300 cols 300

3) Log into influxdb console

root@70bf779673c2:/# influx Connected to http://localhost:8086 version 1.2.0 InfluxDB shell version: 1.2.0

4) use juniper 5) show MEASUREMENTS

Regards

mnanduri commented 6 years ago

removed

psagrera commented 6 years ago

For example in my setup:

I've set proxy env variables like that (inside container):

root@ff363d1bb278:/source# env |grep -i proxy http_proxy=172.17.0.2:80 no_proxy=172.17.0.3

Then, curl to influxdb:8086 works OK

root@ff363d1bb278:/source# curl -sL -I 172.17.0.3:8086/ping --connect-timeout 5 -vvv

and I've got measurements

show MEASUREMENTS name: measurements name

/bgp /junos/system/linecard/interface/

If I excute curl to other IP than influxdb it fails:

root@ff363d1bb278:/source# curl -sL -I 172.17.0.5:8086/ping --connect-timeout 5 -vvv

You can do a quick test by including env variable in docker-compose.yml file

[.....] input-oc: build: $INPUT_OC_DIR container_name: $INPUT_OC_CONTAINER_NAME environment:

For doing that you will have to stop and start openNTI (make stop / make start ).

Pablo

mnanduri commented 6 years ago

Yes sir, that did the trick. Whenever I do an make update i will add this line. I will add rest of the sensors and look again.

Thanks a ton for all your help on this.

onnected to http://localhost:8086 version 1.2.0 InfluxDB shell version: 1.2.0

use juniper Using database juniper show measurements name: measurements name

/junos/system/linecard/interface/ jnpr.jvision

On Fri, Dec 29, 2017 at 9:37 AM, psagrera notifications@github.com wrote:

For example in my setup:

I've set proxy env variables like that (inside container):

root@ff363d1bb278:/source# env |grep -i proxy http_proxy=172.17.0.2:80 no_proxy=172.17.0.3

Then, curl to influxdb:8086 works OK

root@ff363d1bb278:/source# curl -sL -I 172.17.0.3:8086/ping --connect-timeout 5 -vvv

  • Hostname was NOT found in DNS cache
  • Trying 172.17.0.3...
  • Connected to 172.17.0.3 (172.17.0.3) port 8086 (#0)

HEAD /ping HTTP/1.1 User-Agent: curl/7.38.0 Host: 172.17.0.3:8086 Accept: /

< HTTP/1.1 204 No Content HTTP/1.1 204 No Content

and I've got measurements

show MEASUREMENTS name: measurements name


/bgp /junos/system/linecard/interface/

If I excute curl to other IP than influxdb it fails:

root@ff363d1bb278:/source# curl -sL -I 172.17.0.5:8086/ping --connect-timeout 5 -vvv

  • Hostname was NOT found in DNS cache
  • Trying 172.17.0.2...
  • Connected to 172.17.0.2 (172.17.0.2) port 80 (#0)

HEAD HTTP://172.17.0.5:8086/ping HTTP/1.1 User-Agent: curl/7.38.0 Host: 172.17.0.5:8086 Accept: / Proxy-Connection: Keep-Alive

< HTTP/1.1 503 Service Temporarily Unavailable HTTP/1.1 503 Service Temporarily Unavailable

You can do a quick test by including env variable in docker-compose.yml file

[.....] input-oc: image: $INPUT_OC_IMAGE_NAME:$IMAGE_TAG

build: $INPUT_OC_DIR container_name: $INPUT_OC_CONTAINER_NAME environment:

  • "no_proxy=172.17.0.3" <--- put ip_address of your opennti_con [.....]

For doing that you will have to stop and start openNTI (make stop / make start ).

Pablo

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Juniper/open-nti/issues/197#issuecomment-354453689, or mute the thread https://github.com/notifications/unsubscribe-auth/AcT0O6Jd-kDofWLveV2m7GJLB06-2zHTks5tFPkSgaJpZM4RJWkn .

psagrera commented 6 years ago

Glad to hear that ! It's been a pleasure to help you.

Regards

Pablo