fslaborg / FSharp.Stats

statistical testing, linear algebra, machine learning, fitting and signal processing in F#
https://fslab.org/FSharp.Stats/
Other
210 stars 56 forks source link

[BUG] Intervals.ofSeq does not work if nan is in sequence #208

Closed bvenn closed 2 years ago

bvenn commented 2 years ago

Describe the bug

Intervals.ofSeq should create an interval with the minimal and maximal value of a given collection.

Intervals.ofSeq [123.;nan;-12.]      //val it: Intervals.Interval<float> = ClosedInterval (-12.0, -12.0)
Intervals.ofSeqBy id [123.;nan;-12.] //val it: Intervals.Interval<float> = ClosedInterval (-12.0, 123.0)

Expected behavior It is not clear whether there is a valid interval for a collection containing nan. While ofSeqBy gives a theoretically correct result, ofSeq is definitely incorrect.

bvenn commented 2 years ago

This is the problem:

min nan 1  //results in nan
max nan 1  //results in nan

1. < nan //false
1. > nan //false

I think the best option is to return a empty Interval if the collection contains a nan. This leads to an additional check for each element if it is nan, but I think this is a necessary step.

bvenn commented 2 years ago

After reevaluation we came to the conclusion that a nan in a collection should throw an exception, so the user has full control and notices that something is odd in the data.

The function should be updated to allow overloads with arguments that define what happens to nan values.

bvenn commented 2 years ago
Intervals.ofSeqBy snd (List.indexed [5;5;5;5])

results in ClosedInterval ((0, 5), (0, 5)). While not wrong by default, it would be beneficial if the Intergal maximum is the last occurrence within the collection. It can by fixed by modification of https://github.com/fslaborg/FSharp.Stats/blob/262f1acf2cbeeaf008c272774d008d6d462f1022/src/FSharp.Stats/Intervals.fs#L40 to

let mmax,mmaxV = if current >= maximum then current,e.Current else maximum,maximumV