bosun-monitor / bosun

Time Series Alerting Framework
http://bosun.org
MIT License
3.4k stars 493 forks source link

Support influxdb #618

Closed fire closed 9 years ago

fire commented 9 years ago

Would help if bosun supported influxdb. I didn't find a bug tracking this, so here it is.

Since I have multiple datasources sending data to influxdb. (collectd, statsite). It would keep my dependencies low if I could use the influxdb for bosun rather than change the entire system to openTSDB.

fire commented 9 years ago

There's a thread about graphite support. Is it possible to latch on that but port it to influxdb?

kylebrandt commented 9 years ago

I suggest waiting until the graphite PR gets merged. Once that is done this should be easier to do since that will be the first non opentsdb data source so the code will be updated to reflect that. On Dec 26, 2014 6:03 PM, "K. S. Ernest (iFire) Lee" < notifications@github.com> wrote:

There's a thread about graphite support. Is it possible to latch on that but port it to influxdb?

— Reply to this email directly or view it on GitHub https://github.com/bosun-monitor/bosun/issues/618#issuecomment-68162048.

BarthV commented 9 years ago

+1 for InfluxDB ! I'll watch this thread and most likely test very early stages.

elvarb commented 9 years ago

Would love this feature!

fire commented 9 years ago

Initial graphite support seems to have merged.

HakShak commented 9 years ago

For those interested, this was the merge: https://github.com/bosun-monitor/bosun/pull/632

Footprint looks relatively small for integrating influxdb.

maddyblue commented 9 years ago

We need someone who actually uses influxdb to work on this. We don't use it and so probably won't work on it. If you need any help at all please come ask in our chat room:

http://chat.stackoverflow.com/rooms/67157/bosun

rishid commented 9 years ago

In influxdb v0.9 -- which is coming in weeks will have support for "tags" similar to OpenTSDB. It might make porting even easier.

maddyblue commented 9 years ago

I agree. I would not work on this until 0.9 comes out.

rishid commented 9 years ago

The v0.9 docs are out if anyone cares to take a stab at this http://influxdb.com/docs/v0.9/concepts/reading_and_writing_data.html

inthecloud247 commented 9 years ago

+1

blezek commented 9 years ago

My team is interested in InfluxDB, and are using it heavily. Can someone point me to the places in the Bosun code where we would need to look to implement?

maddyblue commented 9 years ago

Check out the GraphiteQuery func in funcs.go:

https://github.com/bosun-monitor/bosun/blob/bd21d2f8907d3089502ca4eb79432f8b09fb4319/cmd/bosun/expr/funcs.go#L556

You'll need to define an Influx var similar to this one:

https://github.com/bosun-monitor/bosun/blob/bd21d2f8907d3089502ca4eb79432f8b09fb4319/cmd/bosun/expr/funcs.go#L85

And then add a influxHost conf directive like:

https://github.com/bosun-monitor/bosun/blob/bd21d2f8907d3089502ca4eb79432f8b09fb4319/cmd/bosun/conf/conf.go#L412

And then enable the influx functions if that directive is present:

https://github.com/bosun-monitor/bosun/blob/bd21d2f8907d3089502ca4eb79432f8b09fb4319/cmd/bosun/conf/conf.go#L1243

blezek commented 9 years ago

@mjibson Have an initial (ugly) implementation @ https://github.com/dblezek/bosun/tree/influxdb Could someone help by taking a look?

This branch adds a new function influxdb into the bosun query language. It can be used as:

influxDBHost = http://dewey-splunk.mayo.edu:8086

alert example.influxdb {
   crit = avg(influxdb("select value from Sorter_association_rate", "5m", "", "")) > 1
}

Some caveats

maddyblue commented 9 years ago

@dblezek Your initial work there appears to be going in the correct direction, minus tag support. Adding this to the items tab requires sending data to the /api/index endpoint in OpenTSDB format. We have a new command that does that for OpenTSDB. You could write one for influx if you needed.

Getting tags working is the next important part in your code.

blezek commented 9 years ago

@mjibson OK, figured out that tags only come across the wire with a group by * clause is added to the query (https://groups.google.com/forum/#!topic/influxdb/rALoIZbJw4o). Added in latest branch, so tags are supported, which is kind of neat.

While working on the 0.9 branch of InfluxDB, some of our code stopped working as expected, so we may switch over to OpenTSDB and Bosun for alerts. I can wrap this branch up, but I doubt if I'll have time to add item support via the /api/index.

Once you are happy, I'll move the InfluxDB support code into it's proper place, remove the debug logging statements and issue a formal pull request.

Thanks for the help, -dan

maddyblue commented 9 years ago

Glad to hear it. There's no need to add /api/index support now. Neither Graphite and Logstash have it. Send a PR once you think things are ready to go.

vishh commented 9 years ago

+1

haomingz commented 9 years ago

+1

justin8 commented 9 years ago

+1

ecables commented 9 years ago

+1

maddyblue commented 9 years ago

I have a working branch with InfluxDB support: https://github.com/bosun-monitor/bosun/compare/influxdb-query

However, I encountered what appears to be a bug in InfluxDB that causes the returned tags to be incorrect, and worse, change every query: https://github.com/influxdb/influxdb/issues/3059

Thus, until that bug is fixed or explained and worked around, we cannot merge this into Bosun.

Until then, though, I would appreciate testing of this branch with people who have real InfluxDB servers.

BarthV commented 9 years ago

nice ! thanks @mjibson

maddyblue commented 9 years ago

InfluxDB will fix the listed bug in their next release. I think the current code is ready for testing by someone who has a running 0.9.0 InfluxDB installation. We will be happy to merge this code if someone who is running such a cluster in production runs this code as a test. We are hesitant to merge this otherwise because it will then have no real testers.

To test, compile the above branch and add influxHost to your conf file, then use the influx function.

pdf commented 9 years ago

@mjibson am I doing something wrong? If I do the following:

It looks like the config option is not getting compiled into the binary, but appears to be in the source :confused:

I was doing something wrong.

pdf commented 9 years ago

Nevermind, I worked it out, need to use the bosun.org source path. Will do some testing in the coming days.

albertocsm commented 9 years ago

Great stuff! W'll try it out this week.

/cc @ilidiomartins

dpochopsky commented 9 years ago

I'm also trying to test Bosun/InfluxDB. I've downloaded the zip'd version of bosun-influxdb-query and followed the installation procedures and it appears that the compiled version does not support the influxDBHost config option. Here's the error I'm getting:

2015/07/14 17:21:26 conf: dev.conf:3:0: at : unknown key influxDBHost

I'm new to 'go' so its possible I built it incorrectly. Would appreciate any help you can provide.

Thanks in advance, Dave

dpochopsky commented 9 years ago

I've managed to get the bosun-influxdb-query branch running. Does anyone have any example alerts which use the influx function?

I believe this is the function: influx(db string, query string, startDuration string, endDuration string, format string)

Would be great if someone could provide some examples with all fields.

Regards, Dave

kylebrandt commented 9 years ago

I haven't tried it, but any alert where you see q() in the example, you should be able to use influx instead since they both hae the same return type - SeriesSet

On Mon, Jul 20, 2015 at 2:18 PM, dpochopsky notifications@github.com wrote:

I've managed to get the bosun-influxdb-query branch running. Does anyone have any example alerts which use the influx function?

I believe this is the function: influx(db string, query string, startDuration string, endDuration string, format string)

Would be great if someone could provide some examples with all fields.

Regards, Dave

— Reply to this email directly or view it on GitHub https://github.com/bosun-monitor/bosun/issues/618#issuecomment-122972786 .

dpochopsky commented 9 years ago

I've been pretty much guessing on the format of the alert, at the moment I'm getting the following error:

too many segments in "sam"."solace"."appliance".avgEgressByteRate at line 1, char 1

Here's my alert:

crit = avg(influx("sam", "select * from sam.solace.appliance.avgEgressByteRate", "5m", "", "'")) > 1

database name is: sam series name is: solace.appliance.avgEgressByteRate which has one tag named "appliance"

Initially my alert did not include the "sam." prefix, as I assumed the first parameter specified the database name. However, if I exclude the "sam." prefix, I get the following error:

database not found: solace

Some sample alerts which use influxdb would really help, has anyone used the bosun-influx-query branch that could send me some sample configuration files?

Regards, Dave

pdf commented 9 years ago

@dpochopsky that's not a valid query - try it in Influx, or in the bosun expression editor, and look over the influxdb documentation (such as it is).

You won't be able to use meaningful queries from influx with bosun yet though, because it requires the use of both single- and double-quotes, but there's no way to escape them in bosun at this stage, so you can only use one or the other.

kylebrandt commented 9 years ago

@pdf can you expand on the quoting issue?

kylebrandt commented 9 years ago

Oh nevermind I found https://github.com/bosun-monitor/bosun/pull/1122#issuecomment-117736332

@mjibson What about triple double quotes like in python? That would probably read cleaner than escaping anyways....

pdf commented 9 years ago

Triple quotes might not be a terrible idea - TOML also uses triple single-quotes for multi-line or string literals containing single quotes.

dpochopsky commented 9 years ago

@pdf So if I understand correctly, I won't be able to execute queries against my time series because the time series names contains '.', and influxdb requires it to be wrapped in double quotes which cannot be done with bosun until it supports escaping quotes, or the like.

pdf commented 9 years ago

@dpochopsky you can use double-quotes if you wrap the query in single-quotes, but if you need to filter the results, that won't work right now, because string-literals in influx queries need to be enclosed by single-quotes (which you've already used to wrap your query). You also can't currently use GROUP BY time() in bosun because of the way it constructs the query's time parameters, so that's also going to stop you doing a lot of useful stuff.

dpochopsky commented 9 years ago

@pdf I tried using single quotes as follows:

avg(influx("sam", 'select * from "solace.appliance.avgEgressByteRate" ', "5m", "", "")) < 1

but I get: expr: invalid syntax

pdf commented 9 years ago

Oh yeah, that happens in the expression editor too, see #1122 - I offered backticks as an option to fix that, but they're also used for multi-line in the configs.

dpochopsky commented 9 years ago

@pdf Thanks for all your help so far.

Regarding your statement about GROUP BY time() not being supported with influxdb... Is there an issue opened to track this. I'd like to understand more about this.

Regards, Dave

pdf commented 9 years ago

@dpochopsky there's not an issue specifically tracking that problem - again, I raised that concern in #1122. Basically, because bosun deals with the time component separately, and needs to insert it into the query, currently it sends the query through some influx code to return a query object that can be analyzed to find the right place to put the time component, however because GROUP BY time() is not valid in influx when there's no time component to the query, the parser throws an error that the query is invalid, and so bosun can't determine where to put the time. It's chicken and egg, but basically I think it just means bosun has to deviate a little more from the way it works with opentsdb in this area.

kylebrandt commented 9 years ago

@pdf Not familiar with influx, but maybe there could be some sort of placeholders in the query string, and bosun could replace those with the arguments provided to the function? Something like that feasible?

On Tue, Jul 21, 2015 at 6:04 PM, Peter Fern notifications@github.com wrote:

@dpochopsky https://github.com/dpochopsky there's not an issue specifically tracking that problem - again, I raised that concern in #1122 https://github.com/bosun-monitor/bosun/pull/1122. Basically, because bosun deals with the time component separately, and needs to insert it into the query, currently it sends the query through some influx code to return a query object that can be analyzed to find the right place to put the time component, however because GROUP BY time() is not valid in influx when there's no time component to the query, the parser throws an error that the query is invalid, and so bosun can't determine where to put the time. It's chicken and egg, but basically I think it just means bosun has to deviate a little more from the way it works with opentsdb in this area.

— Reply to this email directly or view it on GitHub https://github.com/bosun-monitor/bosun/issues/618#issuecomment-123490917 .

pdf commented 9 years ago

@kylebrandt honestly, I haven't looked at this part of the problem much - I've put influx back on the shelf for the time being, because it's lacking a few fundamental query constructs that make it unsuitable for our purposes at this time, and I have some pressure to keep our project moving.

If you're proposing requiring the user to enter some template vars in the query that could be replaced by string manipulation, that would surely work, though it's a usability hit and would need good documentation and examples.

kylebrandt commented 9 years ago

My main thought is that parsing Influx queries would be very difficult to keep up with, but now that I think about it - influx is in Go, so it might not be bad if it could just be imported. Off topic, but wondering if you ended up going with a different tsdb, and if so, which.

On Tue, Jul 21, 2015 at 6:37 PM, Peter Fern notifications@github.com wrote:

@kylebrandt https://github.com/kylebrandt honestly, I haven't looked at this part of the problem much - I've put influx back on the shelf for the time being, because it's lacking a few fundamental query constructs that make it unsuitable for our purposes at this time, and I have some pressure to keep our project moving.

If you're proposing requiring the user to enter some template vars in the query that could be replaced by string manipulation, that would surely work, though it's a usability hit and would need good documentation and examples.

— Reply to this email directly or view it on GitHub https://github.com/bosun-monitor/bosun/issues/618#issuecomment-123499721 .

pdf commented 9 years ago

@kylebrandt the way it's being done in the influx branch is using the native influx query parser, so you're right - it's not too hard to pull in. But the parser barfs if you pass it an invalid query, which is why we have the current issue.

OT: I found Bosun really nice to work with, so I looked at deploying OpenTSDB, but the poor deployment story, performance characteristics, and high overhead from hadoop/hbase was going to make it cost-prohibitive to deploy for our small ~60 node cluster. So, I think I'm going to have to go with one or two Prometheus nodes to meet our timelines, which is not ideal for a few reasons: there's no horizontal scaling for future growth; pull-only adds some complexity; the alerting templates are less expressive; and alertmanager is still young and lacks features. But, I've got to cut something that will get the job done.

nathanielc commented 9 years ago

@mjibson What is the motivation behind the format parameter? Since the InfluxDB query already requires you to specify the tags in the group by statement what value does the format parameter add? Is it just so you can squelch results that are missing tags?

Thanks

maddyblue commented 9 years ago

@nathanielc Yes, I think that's what it was for, since influx could return varying tag sets for a query. Maybe. I forget, it's been a while.

giganteous commented 9 years ago

@nathanielc write me down for early testing, and helping out anywhere you need help with. The incentive of not having to maintain the huge java stack is high for me.

nathanielc commented 9 years ago

I have a new PR ready that starts with the work from mjibson here https://github.com/bosun-monitor/bosun/pull/1291 and should address the few outstanding issues.

nathanielc commented 9 years ago

@giganteous Check out this branch and let me know how it works for you. https://github.com/influxdb/bosun/tree/squashed_influxdb-query

Should be pretty straight forward to use:

avg(
    influx(
        "db_name", 
         '''select value from "my_rp"."my_measurement" where "key1" = 'value1' group by *''', 
         "1h",
         ""
    )
)