brutasse / graphite-api

Graphite-web, without the interface. Just the rendering HTTP API.
https://graphite-api.readthedocs.io
Apache License 2.0
492 stars 131 forks source link

maxDataPoints hint for CustomFinders? #109

Open swissdave opened 9 years ago

swissdave commented 9 years ago

Hi there,

I'm going to try to implement a CustomFinder to obtain data from KDB in order to feed it to Grafana. The data i'm going to query isn't stored at fixed time intervals, so the KDB query will need to time-bucket the data.

Grafana makes use of Graphite's maxDataPoints to ensure that huge amounts of data aren't fed back to the browser and hence make things a lot more efficient.

Given that my query will have to time-bucket the data, it would be handy to receive any maxDataPoints parameter on the query interface - it would allow me to choose a suitable interval for the time buckets.

What do you think? Something worth considering?

brutasse commented 9 years ago

Yes, absolutely. It'd be worth passing the info to the finders.

Dieterbe commented 8 years ago

i was able to do maxDataPoints easily (perhaps not in the most elegant way) see https://github.com/raintank/graphite-api/pull/5 However very related is consolidateBy(). One of the limitations of standard graphite is that you can only rollup using one function (say max) and consolidateBy can be something else (like avg), meaning your data got processed through 2 different functions and the results don't make sense. I have a backend that supports multiple rollups/consolidations (they're both the same thing really logically speaking) so it makes a lot more sense to pass the consolidateBy so it could return the data already consolidated using the right function.

but this seems pretty hard, i was trying to follow the code where it parses the request and sets the pathExpression and consolidateFunc on the TimeSeries object, but this object isn't used in fetch_multi. so the finder isn't aware of the consolidation functions for the nodes it's getting the data for. Any help on how to address would be greatly appreciated. I'd be happy to contribute this feature if only I knew how I could go about it.

zeph commented 8 years ago

@Dieterbe I see your solution leveraging the "g" global variable of Flask, while @mbell697 has preferred to break the interface to the Finder and change the signature to extend it to such parameter... I don't know, I have some time to spend on this but I'd like to produce something that we all like and @brutasse will gladly merge in the main repo

thoughts?

mbell697 commented 8 years ago

I haven't had time to look at this more, but another approach that could be taken is to place maxDataPoints and consolidateBy as attributes in Node. This would duplicate maxDataPoints unnecessarily but consolidateBy can vary by node. It may also make the consolidateBy passing implementation easier. This would fix the fetch / fetch_multi arity overloading ugliness.

zeph commented 8 years ago

@mbell697 that doesn't work for me as the change has to happen once fetched/instantiated a node object, but before you execute the fetch of the data... which is happening in fetchData() at this stage...


    # Group nodes that support multiple fetches
    for pathExpr in pathExprs:
        for node in app.store.find(pathExpr, startTime, endTime):
            if not node.is_leaf:
                continue
            if node.path not in path_to_exprs:
                if hasattr(node, '__fetch_multi__'):
                    multi_nodes[node.__fetch_multi__].append(node)
                else:
                    single_nodes.append(node)
            path_to_exprs[node.path].append(pathExpr)

in my situation, I wanted to reflect the Finder computed "interval" (or so called "data granularity") as part of the "path" (to make it visible to the user), therefore I'd have different keys for the multi_nodes and single_nodes dictionaries

the maxDataPoints variable has to be available to the Finder before this stmt, of the above piece of code, gets executed -> "app.store.find(pathExpr, startTime, endTime)" ...I was tempted to go in the direction of the Flask/g global variable

mbell697 commented 8 years ago

Not sure I see the issue, there are multiple places in the linked code and elsewhere in fetchData where maxDataPoints and consolidateBy could be set on the nodes. Could also happen in storage#find (which already iterates the nodes for a few reasons).

zeph commented 8 years ago

uhmm... @mbell697 let me look at that, thanks for the tip :)

zeph commented 8 years ago

uhmmm @brutasse what were u trying to do here? https://github.com/brutasse/graphite-api/blob/master/graphite_api/storage.py#L41 maybe it is related... in my setup/finder basically i'll make part of the graphite style query language (my node "paths") an extra node level with .interval:15m for example... is it something u were looking for ... or for "interval" u mean the startTime/endTime window?

brutasse commented 8 years ago

@zeph this piece of code is from graphite-web, it was added in https://github.com/graphite-project/graphite-web/commit/f18bb3c711bd566debf3edcc6b895be9fb6210b0

I don't know what was the original intent…

zeph commented 8 years ago

...and the TODO is also refactored but still in the graphite-web prj xD ...k, nevermind https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/storage.py#L76

pkittenis commented 8 years ago

If I may comment, though a bit late :)

The intention of the TODO at https://github.com/brutasse/graphite-api/blob/master/graphite_api/storage.py#L41 is to get nodes (paths) that existed in the date/time range requested, but not outside it. Ie if a node was newly added at T but request is for T-2 <-> T-1 the response should not include the path as it did not exist in the requested date/time range.

AFAIK this is not possible in any storage backend, including whisper. Do not think it is applicable in this instance.

This still happens on all graphite installations, can see it by requesting old data with a wildcard pattern after inserting a bunch of new nodes. The nodes show up in legend but without any data which in many cases can make the graph unreadable with a huge legend. Ditto for nodes that stop sending data, they will continue to show up in legend/path finding unless deleted on the backend.

BTW, would also love maxDataPoints hint for use in custom finder :+1:

zeph commented 8 years ago

you need to check with @mb-m if he has time... I left #gaikai, he is now the one on the prj