fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

Series.windowSize throws IndexOutOfRangeException when size is bigger than input series length #559

Open Choc13 opened 11 months ago

Choc13 commented 11 months ago

I've noticed that calling Series.windowSize (x, Boundary.AtBeginning) can result in an System.IndexOutOfRangeException: Index was outside the bounds of the array. error. The following fsx script shows this:

#r "nuget: Deedle"

open Deedle

// Create an empty series
let series: Series<_, int> = [] |> Series.ofValues

// This is fine with any size
series |> Series.window 100

// This is fine
series |> Series.windowSize (1, Boundary.AtBeginning)

// This will error with an IndexOutOfRangeException
series |> Series.windowSize (2, Boundary.AtBeginning)

I have observed that this always happens whenever size > series.Length + 1 and only when Boundary.AtBeginning is specified. The full stack trace is:

System.IndexOutOfRangeException: Index was outside the bounds of the array.
   at Deedle.Vectors.ArrayVector.ArrayVectorBuilder.Deedle.Vectors.IVectorBuilder.Build[T](IAddressingScheme scheme, VectorConstruction command, IVector`1[] arguments) in C:\Dev\fslab\Deedle\src\Deedle\Vectors\ArrayVector.fs:line 195
   at <StartupCode$Deedle>.$Series.Aggregate@792.Invoke(Tuple`2 tupledArg) in C:\Dev\fslab\Deedle\src\Deedle\Series.fs:line 794
   at Microsoft.FSharp.Collections.Internal.IEnumerator.map@99.DoMoveNext(b& curr) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 105
   at Microsoft.FSharp.Collections.Internal.IEnumerator.MapEnumerator`1.System.Collections.IEnumerator.MoveNext() in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 88
   at System.Collections.Generic.List`1..ctor(IEnumerable`1 collection)
   at Microsoft.FSharp.Collections.SeqModule.ToArray[T](IEnumerable`1 source) in D:\a\_work\1\s\src\FSharp.Core\seq.fs:line 997
   at Deedle.Indices.Linear.LinearIndexBuilder.Deedle.Indices.IIndexBuilder.Aggregate[K,TNewKey,R](IIndex`1 index, Aggregation`1 aggregation, VectorConstruction vector, FSharpFunc`2 selector) in C:\Dev\fslab\Deedle\src\Deedle\Indices\LinearIndex.fs:line 322
   at Deedle.Series`2.Aggregate[TNewKey,R](Aggregation`1 aggregation, Func`2 keySelector, Func`2 valueSelector) in C:\Dev\fslab\Deedle\src\Deedle\Series.fs:line 789
   at Deedle.SeriesModule.WindowSizeInto[K,T,R](Int32 bounds_0, Boundary bounds_1, FSharpFunc`2 f, Series`2 series) in C:\Dev\fslab\Deedle\src\Deedle\SeriesModule.fs:line 845
   at <StartupCode$FSI_0009>.$FSI_0009.main@() in /Users/matthewthornton/code/ops/test/Domain.FSharp.Tests/DeedleWindowSizeBug.fsx:line 49
   at System.RuntimeMethodHandle.InvokeMethod(Object target, Void** arguments, Signature sig, Boolean isConstructor)
   at System.Reflection.MethodInvoker.Invoke(Object obj, IntPtr* args, BindingFlags invokeAttr)
Stopped due to error

Having followed the code through from the stack trace of the error I believe the bug is in the Seq.windowRangesWithBounds function. Specifically I think that L140 should be taking into account the size of the input as well as the size of the window. So perhaps something more along these lines:

for i in 0L .. min (size - 2L) (length - 1L) do yield DataSegmentKind.Incomplete, 0L, i

(I think the same problem might exist on the upper end when Boundary.AtEnding is specified).

I'm happy to open a PR for this change if you also think it's a bug.

ingted commented 8 months ago

Seems like the maintainer is busy... T____T