Open swissdave opened 9 years ago
Yes, absolutely. It'd be worth passing the info to the finders.
i was able to do maxDataPoints easily (perhaps not in the most elegant way) see https://github.com/raintank/graphite-api/pull/5 However very related is consolidateBy(). One of the limitations of standard graphite is that you can only rollup using one function (say max) and consolidateBy can be something else (like avg), meaning your data got processed through 2 different functions and the results don't make sense. I have a backend that supports multiple rollups/consolidations (they're both the same thing really logically speaking) so it makes a lot more sense to pass the consolidateBy so it could return the data already consolidated using the right function.
but this seems pretty hard, i was trying to follow the code where it parses the request and sets the pathExpression and consolidateFunc on the TimeSeries object, but this object isn't used in fetch_multi. so the finder isn't aware of the consolidation functions for the nodes it's getting the data for. Any help on how to address would be greatly appreciated. I'd be happy to contribute this feature if only I knew how I could go about it.
@Dieterbe I see your solution leveraging the "g" global variable of Flask, while @mbell697 has preferred to break the interface to the Finder and change the signature to extend it to such parameter... I don't know, I have some time to spend on this but I'd like to produce something that we all like and @brutasse will gladly merge in the main repo
thoughts?
I haven't had time to look at this more, but another approach that could be taken is to place maxDataPoints
and consolidateBy
as attributes in Node
. This would duplicate maxDataPoints
unnecessarily but consolidateBy
can vary by node. It may also make the consolidateBy
passing implementation easier. This would fix the fetch
/ fetch_multi
arity overloading ugliness.
@mbell697 that doesn't work for me as the change has to happen once fetched/instantiated a node object, but before you execute the fetch of the data... which is happening in fetchData() at this stage...
# Group nodes that support multiple fetches
for pathExpr in pathExprs:
for node in app.store.find(pathExpr, startTime, endTime):
if not node.is_leaf:
continue
if node.path not in path_to_exprs:
if hasattr(node, '__fetch_multi__'):
multi_nodes[node.__fetch_multi__].append(node)
else:
single_nodes.append(node)
path_to_exprs[node.path].append(pathExpr)
in my situation, I wanted to reflect the Finder computed "interval" (or so called "data granularity") as part of the "path" (to make it visible to the user), therefore I'd have different keys for the multi_nodes and single_nodes dictionaries
the maxDataPoints variable has to be available to the Finder before this stmt, of the above piece of code, gets executed -> "app.store.find(pathExpr, startTime, endTime)" ...I was tempted to go in the direction of the Flask/g global variable
Not sure I see the issue, there are multiple places in the linked code and elsewhere in fetchData
where maxDataPoints
and consolidateBy
could be set on the nodes. Could also happen in storage#find
(which already iterates the nodes for a few reasons).
uhmm... @mbell697 let me look at that, thanks for the tip :)
uhmmm @brutasse what were u trying to do here? https://github.com/brutasse/graphite-api/blob/master/graphite_api/storage.py#L41 maybe it is related... in my setup/finder basically i'll make part of the graphite style query language (my node "paths") an extra node level with .interval:15m
for example... is it something u were looking for ... or for "interval" u mean the startTime/endTime window?
@zeph this piece of code is from graphite-web, it was added in https://github.com/graphite-project/graphite-web/commit/f18bb3c711bd566debf3edcc6b895be9fb6210b0
I don't know what was the original intent…
...and the TODO is also refactored but still in the graphite-web prj xD ...k, nevermind https://github.com/graphite-project/graphite-web/blob/master/webapp/graphite/storage.py#L76
If I may comment, though a bit late :)
The intention of the TODO at https://github.com/brutasse/graphite-api/blob/master/graphite_api/storage.py#L41 is to get nodes (paths) that existed in the date/time range requested, but not outside it. Ie if a node was newly added at T but request is for T-2 <-> T-1 the response should not include the path as it did not exist in the requested date/time range.
AFAIK this is not possible in any storage backend, including whisper. Do not think it is applicable in this instance.
This still happens on all graphite installations, can see it by requesting old data with a wildcard pattern after inserting a bunch of new nodes. The nodes show up in legend but without any data which in many cases can make the graph unreadable with a huge legend. Ditto for nodes that stop sending data, they will continue to show up in legend/path finding unless deleted on the backend.
BTW, would also love maxDataPoints
hint for use in custom finder :+1:
you need to check with @mb-m if he has time... I left #gaikai, he is now the one on the prj
Hi there,
I'm going to try to implement a CustomFinder to obtain data from KDB in order to feed it to Grafana. The data i'm going to query isn't stored at fixed time intervals, so the KDB query will need to time-bucket the data.
Grafana makes use of Graphite's maxDataPoints to ensure that huge amounts of data aren't fed back to the browser and hence make things a lot more efficient.
Given that my query will have to time-bucket the data, it would be handy to receive any maxDataPoints parameter on the query interface - it would allow me to choose a suitable interval for the time buckets.
What do you think? Something worth considering?