brutasse / graphite-cyanite

A plugin for using graphite-web with the cassandra-based Cyanite storage backend.
BSD 3-Clause "New" or "Revised" License
85 stars 21 forks source link

Data points being shifted when retrieving multiple series #11

Open aaronm54 opened 9 years ago

aaronm54 commented 9 years ago

I am seeing an issue where the data points for one series are being shifted to match the start time of another series that was requested.

For example, suppose I request from Graphite the sum of series x.y.z.* for time frame A to D. Let's assume that Cyanite returns two matching series: x.y.z.foo and x.y.z.bar. The graphite-cyanite plugin then loops through these two different paths for that time frame.

Now suppose that Cassandra only has data points for x.y.z.foo for a subset of that time. Cyanite will respond with data points from an adjusted time frame of B to D.

Next the plugin requests the data points for x.y.z.bar. This time Cassandra only has data points for time frame C to D.

The graphite-cyanite plugin is only remembering the adjusted time frame from the first path requested. The data points for the second path are basically being shifted to match the start time of the first path instead of inserting null values for the larger time frame.

I think what needs to be done is to cache the adjusted start time for each series. Once all of the data has been requested, the plugin needs to pick the earliest start time and insert null values accordingly.

aaronm54 commented 9 years ago

I have made some changes to the CyaniteReader class that seems to fix this issue. It basically back fills the series list to the request start time with null values. Here is an updated fetch method from CyaniteReader:

    def fetch(self, start_time, end_time):
        data = requests.get(urls.metrics, params={'path': self.path,
                                                  'from': start_time,
                                                  'to': end_time}).json()
        if 'error' in data:
            return (start_time, end_time, end_time - start_time), []
        if len(data['series']) == 0:
            return

        tmpTime = data['from']
        while (tmpTime - data['step']) >= start_time :
            tmpTime = tmpTime - data['step']
            data['series'].get(self.path, []).insert(0, None)

        time_info = tmpTime, data['to'], data['step']
        return time_info, data['series'].get(self.path, [])