DataHaskell / dh-core

Functional data science
139 stars 23 forks source link

Floating point and approximate comparison #19

Open Shimuuar opened 5 years ago

Shimuuar commented 5 years ago

In #14 I have raised point that we need set of type classes for properties of numbers and for approximate comparison.

So what I think we need is type class for things like machine epsilon, maximal and minimal representable number, transfinite numbers, NaN handling etc.. And another type class for approximate comparison of values. Design space here is rather large so it would be good to collect current state of art and implementations in different languages and libraries

AdLucem commented 5 years ago

I'm interested in this, but I wouldn't know where to start looking for theory about approximate comparisions. Can you give me some pointers?

Edit: also, type classes for approximate comparision of other data types- particularly vectors and matrices- would be incredibly useful.

Shimuuar commented 5 years ago

I wish i knew. It's very much open problem and it seems not much research about it. I didn't look much though. I guess best introductory reading about floating point and its problems is "What every computer scientist should know about floating-point"

Problem with approximate equality is it's extremely context dependent and exact meaning of "approximately" depends on problem at hand. Relative equality |a - b| / max |a| |b| < ε forks fine save for problem of choosing ε. Absolute difference |a-b|<ε breaks down when value vary over many orders of magnitude. Once you go to more complex objects such as vectors/matrices there're much more ways to compare them. Thus type class with method approximately equal is impossible. Best we can hope is to provide tools and combinators for user

nstepp commented 2 years ago

Would it be reasonable to take the pytest approach and use both relative and absolute tests?

A set of utility functions could be provided along these lines, and it seems to me that providing type classes that are sufficiently constrained makes sense. For instance, providing instances only for field-level types and using tools for more complex objects. But I'm quite open to being wrong about that.