fslaborg / Deedle

Easy to use .NET library for data and time series manipulation and for scientific programming
http://fslab.org/Deedle/
BSD 2-Clause "Simplified" License
924 stars 196 forks source link

SeriesExtension.FillMissing(double.NaN) does not work #518

Closed mtreske closed 3 years ago

mtreske commented 3 years ago

We are actually using Deedle 1.2.5 where we use the FillMissing() series extension method do fill the missing values with double.NaN. Since we move our libs to NetStandard we need to use version 2.x. I tried 2.03 up to 2.3.0 but the missing value will never be replaced with double.NaN.

any ideas?

thx and Regards Manuel

zyzhu commented 3 years ago

Could you provide a minimal repro of sample frame or series?

zyzhu commented 3 years ago

The following script works.

let sr = series ["A", 0.; "B", nan]
val it : Series<string,float> = 
A -> 0         
B -> <missing> 

sr.FillMissing(0.)
val it : Series<string,float> = 
A -> 0 
B -> 0 
mtreske commented 3 years ago

what happens if you call sr.FillMissing(nan) like we do sr.FillMissing(double.NaN) or sr.FillMissing(0d/0d)?

zyzhu commented 3 years ago

The following is the expected behavior. What is your sample series that does not behave?

series [1, 1.; 2, nan]
val it : Series<int,float> = 
1 -> 1         
2 -> <missing> 

(series [1, 1.; 2, nan]).FillMissing(nan)
val it : Series<int,float> = 
1 -> 1         
2 -> <missing> 
mtreske commented 3 years ago

we would expect a behaviour like this. this is like it behaved until Version 2.0.0.

val it : Series<int,float> = 
1 -> 1         
2 -> <missing> 

it.FillMissing(nan)
val it : Series<int,float> = 
1 -> 1         
2 -> nan

Since Version 2.0.1 it behaves like this

it.FillMissing(nan)
val it : Series<int,float> = 
1 -> 1         
2 -> <missing>

We are using FillMissing(nan) extensively. So at the moment this looks like a showstopper for us.

i found this https://github.com/fslaborg/Deedle/issues/446 issue where you mentioned that the FillMissing((x) => double.NaN) is the correct way to replace the missing values with NaN. Is this the way it should work?

zyzhu commented 3 years ago

That was indeed a bug in #446 that was fixed then. According to the original design notes https://fslab.org/Deedle/design.html, nan should be treated as missing value. The current version treats certain values as "missing" values, including Double.NaN (for numeric values) and null (for Nullable<'T> types and reference types). This means that when you create a series from Double.NaN, this is turned into a missing value and the value is skipped when doing aggregation such as Series.sum

I believe there must be a way to solve your problem. Could you share what are you trying to achieve after filling missing with nan?

mtreske commented 3 years ago

for example : our BL apps try to read the series and series[n] throws an MissingValueException when the value is equal to <missing>.

zyzhu commented 3 years ago

If you are writing C#, read the docs here https://fslab.org/Deedle/csharpseries.html#Lookup-by-key-and-index. You need to call TryGet or TryGetAt for series that has missing values.

mtreske commented 3 years ago

this is to much effort. We are not able to do all the changes at the moment. We are counting on FillMissing(nan).

zyzhu commented 3 years ago

Sorry that your production code relies on an old buggy behavior. As I said before, it's by design that nan shall be treated as missing value. Using TryGet or TryGetAt shall solve your problem.

A minimal change is to have a wrapper in your code to convert all the missing OptionalValue to nan. For instance, something like this

let tryGetConvertNan (n:'K) (sr:Series<'K, float>) =
  let value = sr.TryGet(n)
  if value.HasValue then value.Value else nan
let sr = series [1, 1.; 2, nan]
sr |> tryGetConvertNan 2
mtreske commented 3 years ago

i had a look at the design notes. As far as i understand the handling of nan as missing value is based in the BL in some aggregate functionslike series.sum. It seems that handling nan as a missing value was implemented to simplify the BL of these functions. For me there is a difference between nan and <missing>. The alternative mentioned in the design notes is to support both nan, and <missing> in the series.sum method without a clear solution...and i would prefer this solution.

zyzhu commented 3 years ago

The original design notes did offer an alternative opinion about treating nan differently from missing. But the current implementation of treating nan as missing was a conscious choice if you look at all the unit test cases involving nan such as these https://github.com/fslaborg/Deedle/blob/master/tests/Deedle.Tests/Series.fs#L40

Please take some time to modify your code to use TryGet or TryGetAt to avoid exceptions. That's a common cost of adopting an open source library when meeting breaking changes (in this case, a bug fixed) during upgrading versions. We've all been through that experience a lot. I can hear your pain points.

I'll close this issue as it's by design.