fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Suspicious result of AggregateRowsBy #375

Closed goerch closed 5 years ago

goerch commented 6 years ago

The following program

using Deedle;
using System;
using System.Collections.Generic;
using System.Linq;

namespace Bug1
{
    class Program
    {
        static IEnumerable<DateTime> dateRange(DateTime first, int count)
        {
            return from days in Enumerable.Range(0, count) select first.AddDays(days);
        }
        static IEnumerable<double> rand(int count)
        {
            var rnd = new Random();
            return from i in Enumerable.Range(0, count) select rnd.NextDouble();
        }
        static void Main(string[] args)
        {
            var dates = new DateTime[] { new DateTime(2013, 1, 1), new DateTime(2013, 1, 4), new DateTime(2013, 1, 8)};
            var values = new double[] { 10.0, 20.0, 30.0 };
            var first = new Series<DateTime, double>(dates, values);
            var second = new Series<DateTime, double>(dateRange(new DateTime(2013, 1, 1), 10), rand(10));
            var third = new Series<DateTime, double>(dateRange(new DateTime(2013, 1, 1), 10),
                new double[] { 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0 });
            var df1 = Frame.FromColumns(
                new KeyValuePair<string, Series<DateTime, double>>[] {
                    new KeyValuePair<string, Series<DateTime, double>>("first", first),
                    new KeyValuePair<string, Series<DateTime, double>>("second", second),
                    new KeyValuePair<string, Series<DateTime, double>>("third", third) });
            df1.Print();
            // df1.AggregateRowsBy<double, double>(new string[] { "first" }, new string[] { "second" }, Stats.mean).Print();
            // df1.AggregateRowsBy<double, double>(new string[] { "third" }, new string[] { "second" }, Stats.mean).Print();
            df1.AggregateRowsBy<double, double>(new string[] { "first", "third" }, new string[] { "second" }, Stats.mean).Print();
        }
    }
}

results in

                       first     second            third
01.01.2013 00:00:00 -> 10        0,917007029017902 1
02.01.2013 00:00:00 -> <missing> 0,221830189331356 2
03.01.2013 00:00:00 -> <missing> 0,522494232991009 3
04.01.2013 00:00:00 -> 20        0,529293298502124 4
05.01.2013 00:00:00 -> <missing> 0,173366170457269 5
06.01.2013 00:00:00 -> <missing> 0,292806196162853 6
07.01.2013 00:00:00 -> <missing> 0,599101097601979 7
08.01.2013 00:00:00 -> 30        0,35385006729227  8
09.01.2013 00:00:00 -> <missing> 0,357920941597745 9
10.01.2013 00:00:00 -> <missing> 0,16586133705725  10

     first third     second
0 -> 10    1         0,917007029017902
1 -> 2     <missing> 0,221830189331356
2 -> 3     <missing> 0,522494232991009
3 -> 20    4         0,529293298502124
4 -> 5     <missing> 0,173366170457269
5 -> 6     <missing> 0,292806196162853
6 -> 7     <missing> 0,599101097601979
7 -> 30    8         0,35385006729227
8 -> 9     <missing> 0,357920941597745
9 -> 10    <missing> 0,16586133705725

Looks like the first and third series are mixed up. Is this expected behaviour?

goerch commented 6 years ago

Could this be related to #253?

zyzhu commented 5 years ago

The issue shall be fixed. It's caused by Series.values when getting group labels, which excluded missing values.