cargomedia / bipbip

UNMAINTAINED. Gather services data and store in CopperEgg/IDERA
MIT License
4 stars 7 forks source link

Make sure we don't fail because of missing network on startup #165

Closed njam closed 8 years ago

njam commented 8 years ago

Currently when starting without Internet access the program fails. Instead it should log an error and retry indefinitely.

[..]
I, [2016-07-12T16:30:03.811278 #1255]  INFO -- : Startup...
I, [2016-07-12T16:30:03.811498 #1255]  INFO -- : Setting up plugin postfix for storage copperegg
I, [2016-07-12T16:30:03.811531 #1255]  INFO -- copperegg: Loading metric groups
F, [2016-07-12T16:30:03.812457 #1255] FATAL -- : getaddrinfo: Temporary failure in name resolution
    /usr/lib/ruby/2.1.0/net/http.rb:879:in `initialize'
    /usr/lib/ruby/2.1.0/net/http.rb:879:in `open'
    /usr/lib/ruby/2.1.0/net/http.rb:879:in `block in connect'
    /usr/lib/ruby/2.1.0/timeout.rb:76:in `timeout'
    /usr/lib/ruby/2.1.0/net/http.rb:878:in `connect'
    /usr/lib/ruby/2.1.0/net/http.rb:863:in `do_start'
    /usr/lib/ruby/2.1.0/net/http.rb:852:in `start'
    /usr/lib/ruby/2.1.0/net/http.rb:1369:in `request'
    /var/lib/gems/2.1.0/gems/copperegg-revealmetrics-0.8.1/lib/copperegg/revealmetrics/mixins/persistence.rb:58:in `request'
    /var/lib/gems/2.1.0/gems/copperegg-revealmetrics-0.8.1/lib/copperegg/revealmetrics/mixins/persistence.rb:20:in `find'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/storage/copperegg.rb:67:in `_load_metric_groups'
        /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/storage/copperegg.rb:9:in `setup_plugin'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:29:in `block (2 levels) in run'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:27:in `each'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:27:in `block in run'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:26:in `each'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:26:in `run'
    /var/lib/gems/2.1.0/gems/bipbip-0.6.22/bin/bipbip:44:in `<top (required)>'
    /usr/local/bin/bipbip:23:in `load'
    /usr/local/bin/bipbip:23:in `<main>'
I, [2016-07-12T16:30:04.448995 #1403]  INFO -- : Startup...
[..]
kris-lab commented 8 years ago

@njam we have basically 2 ways to go:

wdyt?

njam commented 8 years ago

I'm not sure I follow. You mean to implement a function that tests if we have Internet connectivity? If we don't, how would that information be used?

The problem is that setup_plugin for CopperEgg fails if there's no Internet connection. Either we can skip it and start collecting data nevertheless, or we can retry forever.

/var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/storage/copperegg.rb:67:in `_load_metric_groups'
/var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/storage/copperegg.rb:9:in `setup_plugin'
/var/lib/gems/2.1.0/gems/bipbip-0.6.22/lib/bipbip/agent.rb:29:in `block (2 levels) in run'

I would vote for retrying forever. This will simplify things, because we don't need to think about any invalid state if the setup could not succeed. So how about we retry all operations in Storage::Copperegg forever, if they failed due to network problems?

kris-lab commented 8 years ago

Ok, I am not sure if I follow now!

I was not investigating this problem so deep! I was pretty sure by what you and Philipp said, that this issue appears only on system reboot? It means it never happens during normal operations? I guess there might be a problem with internet connection and copperegg is unreachable but it doesn't crash the bipbip then, agree?

So should we implement crash protection deep in e.g. setup_plugin then?

njam commented 8 years ago

As discussed:

kris-lab commented 8 years ago

@njam wdyt?

kris-lab commented 8 years ago

@njam please re-review

kris-lab commented 8 years ago

released