RedisTimeSeries / NRedisTimeSeries

.Net Client for RedisTimeSeries
https://redistimeseries.io
BSD 3-Clause "New" or "Revised" License
28 stars 11 forks source link

Improve performance and reduce allocations when parsing RedisResults #33

Open shaunsales opened 4 years ago

shaunsales commented 4 years ago

Currently when a RedisResult[] is returned from a TS.RANGE or similar query, we enumerate the entire result and allocate it to a new IReadOnlyCollection. This feels unnecessary, and iteration should be left up to the user.

private static TimeStamp ParseTimeStamp(RedisResult result)
{
    if (result.Type == ResultType.None) return default;
    return new TimeStamp((long)result);
}

private static TimeSeriesTuple ParseTimeSeriesTuple(RedisResult result)
{
    RedisResult[] redisResults = (RedisResult[])result;
    if (redisResults.Length == 0) return null;
    return new TimeSeriesTuple(ParseTimeStamp(redisResults[0]), (double)redisResults[1]);
}

private static IReadOnlyList<TimeSeriesTuple> ParseTimeSeriesTupleArray(RedisResult result)
{
    RedisResult[] redisResults = (RedisResult[])result;
    var list = new List<TimeSeriesTuple>(redisResults.Length);
    if (redisResults.Length == 0) return list;
    Array.ForEach(redisResults, tuple => list.Add(ParseTimeSeriesTuple(tuple)));
    return list;
}

I propose we wrap up the RedisResult into a TsTimeSeriesCollection object, something like;

public class TsTimeSeriesCollection
{
    private RedisResult[] _redisResults;

    public TimeSeriesCollection(RedisResult redisResult)
    {
        _redisResults = (RedisResult[])redisResult;
    }

    public (long TimeStamp, double Value) this[int index] => ((long)((RedisResult[]) _redisResults[index])[0], (double) ((RedisResult[]) _redisResults[index])[1]);

    public int Count => _redisResults.Length;
}

The above will only cast the samples RedisResult when it's accessed, which should help with performance when large sample sets are returned. I haven't extended this to implement IEnumerable<(long,double)> or started optimizing, but the general idea is to create a wrapper around the RedisResult[] and make time series arrays less allocatey and easier to work with.

Since this is a breaking change, it might make sense to do prior to the next release. Feedback welcome!

DvirDukhan commented 4 years ago

@shaunsales great input, as always.

Let's iterate over the design as I would like to have the IEnumerable also implemented. This will cause it to be an implementation of IReadOnlyList for an example

using System;
using System.Collections;
using System.Collections.Generic;

namespace NRedisTimeSeries.DataTypes
{
    public class TimeSeriesCollection : IReadOnlyList<(long, double)>
    {
        public TimeSeriesCollection()
        {
        }

        public (long, double) this[int index] => throw new NotImplementedException();

        public int Count => throw new NotImplementedException();

        public IEnumerator<(long, double)> GetEnumerator()
        {
            throw new NotImplementedException();
        }

        IEnumerator IEnumerable.GetEnumerator()
        {
            throw new NotImplementedException();
        }
    }
}

What do you think? Thanks for your inputs

shaunsales commented 4 years ago

Here's a first cut of a timeseries collection class that I think would meet our requirements;


public class TsCollection : IEnumerator<(long TimeStamp, double Value)>
{
    private int _index = -1;

    private RedisResult[] _redisResults;

    public TsCollection(RedisResult redisResult)
    {
        _redisResults = (RedisResult[])redisResult;
    }

    public (long TimeStamp, double Value) this[int index]
    {
        get
        {
            var item = (RedisResult[])_redisResults[index];
            return ((long)item[0], (double)item[1]);
        }
    }

    public int Count => _redisResults.Length;

    public IEnumerator<(long TimeStamp, double Value)> GetEnumerator() => this;

    public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _redisResults.Length) ? this[_index] : throw new IndexOutOfRangeException();

    object IEnumerator.Current => Current;

    public bool MoveNext() => ++_index < _redisResults.Length;

    public void Dispose() => Reset();

    public void Reset() => _index = -1;
}

I've not implemented it as a readonly or immutable collection as that adds quite a bit of overhead that I didn't feel was necessary. If there's a good reason to make the collection immutable, I suggest we use ImmutableArray<T> or ImmutableList<T> but given the underlying RedisResult[] is not immutable or readonly it seems better to follow the same pattern.

DvirDukhan commented 4 years ago

@shaunsales Let's split the logic between the enumerator and the direct access functionality something like

using System;
using System.Collections;
using System.Collections.Generic;
using StackExchange.Redis;

namespace NRedisTimeSeries.DataTypes
{

    public class TsCollection : IReadOnlyList<(long TimeStamp, double Value)>
    {

        private static (long, double) ResultAsTuple(RedisResult result)
        {
            var item = (RedisResult[])result;
            return ((long)item[0], (double)item[1]);
        }

        private class TsCollectionEnumerator : IEnumerator<(long TimeStamp, double Value)>
        {
            private int _index = -1;
            RedisResult[] _results;
            public TsCollectionEnumerator(TsCollection collection)
            {
                _results = collection._redisResults;
            }

            public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _results.Length) ? ResultAsTuple(_results[_index]) : throw new IndexOutOfRangeException();

            object IEnumerator.Current => Current;

            public bool MoveNext() => ++_index < _results.Length;

            public void Reset() => _index = -1;

            public void Dispose() => Reset();

        }

        private RedisResult[] _redisResults;

        public TsCollection(RedisResult redisResult)
        {
            _redisResults = (RedisResult[])redisResult;
        }

        public (long, double) this[int index]
        {
            get => ResultAsTuple(_redisResults[index]);

        }

        public int Count => _redisResults.Length;

        public IEnumerator<(long, double)> GetEnumerator() => new TsCollectionEnumerator(this);

        IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
    }
}

WDYT?

shaunsales commented 4 years ago

Looking good. I've made a few (quick and dirty) updates to include MRANGE as a collection of ranges with Name and Label support.

public class TsMRangeResults : IReadOnlyList<TsMRangeResult>
{
    private class TsMRangeEnumerator : IEnumerator<TsMRangeResult>
    {
        private int _index = -1;

        private IReadOnlyList<TsMRangeResult> _tsMRangeResults;

        public TsMRangeEnumerator(TsMRangeResults collection) => _tsMRangeResults = collection._tsMRangeResults;

        public TsMRangeResult Current => (_index > -1 && _index <= _tsMRangeResults.Count) ? _tsMRangeResults[_index] : throw new IndexOutOfRangeException();

        object IEnumerator.Current => Current;

        public bool MoveNext() => ++_index < _tsMRangeResults.Count;

        public void Reset() => _index = -1;

        public void Dispose() => Reset();
    }

    private IReadOnlyList<TsMRangeResult> _tsMRangeResults;

    public int Count { get; }

    public TsMRangeResults(RedisResult redisResult)
    {
        var redisResults = (RedisResult[])redisResult;

        Count = redisResults.Length;

        if (redisResults.Length > 0)
        {
            var list = new List<UserQuery.TsMRangeResult>(redisResults.Length);
            for (int i = 0; i < redisResults.Length; i++)
            {
                list.Add(new TsMRangeResult(redisResults[i]));
            }
            _tsMRangeResults = list;
        }
    }

    public TsMRangeResult this[int index] => _tsMRangeResults[index];

    public IEnumerator<TsMRangeResult> GetEnumerator() => new TsMRangeEnumerator(this);

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

public class TsMRangeResult : TsRangeResult
{
    public string Name { get; }

    public IList<(string, string)> Labels { get; }

    public TsMRangeResult(RedisResult redisResult) : base((RedisResult[])redisResult)
    {
        var redisResults = (RedisResult[])redisResult;

        Name = (string)redisResults[0];

        var labels = (RedisResult[])redisResults[1];

        if (labels.Length > 0)
        {
            Labels = new List<(string, string)>(labels.Length);

            for (int i = 0; i < labels.Length; i++)
            {
                var labelValue = (RedisResult[])labels[i];
                Labels.Add(((string)labelValue[0], (string)labelValue[1]));
            }
        }
    }
}

public class TsRangeResult : IReadOnlyList<(long TimeStamp, double Value)>
{
    private static (long TimeStamp, double Value) ResultAsTuple(RedisResult result)
    {
        var item = (RedisResult[])result;
        return ((long)item[0], (double)item[1]);
    }

    private class TsRangeResultEnumerator : IEnumerator<(long TimeStamp, double Value)>
    {
        private int _index = -1;

        RedisResult[] _redisResults;

        public TsRangeResultEnumerator(TsRangeResult collection) => _redisResults = collection._redisResults;

        public (long TimeStamp, double Value) Current => (_index > -1 && _index <= _redisResults.Length) ? ResultAsTuple(_redisResults[_index]) : throw new IndexOutOfRangeException();

        object IEnumerator.Current => Current;

        public bool MoveNext() => ++_index < _redisResults.Length;

        public void Reset() => _index = -1;

        public void Dispose() => Reset();
    }

    private RedisResult[] _redisResults;

    public int Count { get; }

    public TsRangeResult(RedisResult redisResult)
    {
        // TODO: Add some data shape checks
        _redisResults = (RedisResult[])redisResult;

        Count = _redisResults.Length;
    }

    protected TsRangeResult(RedisResult[] redisResults)
    {
        _redisResults = (RedisResult[])redisResults[2];
    }

    public (long TimeStamp, double Value) this[int index] => ResultAsTuple(_redisResults[index]);

    public IEnumerator<(long TimeStamp, double Value)> GetEnumerator() => new TsRangeResultEnumerator(this);

    IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
}

This will allow users to access the timeseries results with fairly simply syntax. Here's a couple of examples;

var tsMRangeResults = new TsMRangeResults(db.Execute("TS.MRANGE", new[] { "-", "+", "WITHLABELS", "FILTER", "lbl=abc" }));

foreach (var mRange in tsMRangeResults)
{
    foreach (var item in mRange)
    {
        $"{mRange.Name} {item.TimeStamp}:{item.Value} {mRange.Labels.Count}".Dump();
    }
}

var tsRange = new TsRangeResult(db.Execute("TS.RANGE", new[] { key, "-", "+" }));

foreach (var item in tsRange)
{
    $"{item.TimeStamp}:{item.Value}".Dump();
}
DvirDukhan commented 4 years ago

Nice approach I think that you can apply the iterator approach everywhere here so you will not have to allocate lists at all Let's first finish with PR #35 and continue with this

shaunsales commented 4 years ago

Nice approach

I think that you can apply the iterator approach everywhere here so you will not have to allocate lists at all

Let's first finish with PR #35 and continue with this

Agreed - the labels implementation was a bit quick and dirty. I think we can optimise and improve this once we turn it into a PR.