fslaborg / flips

Fsharp LInear Programming System
https://flipslibrary.com/#/
MIT License
251 stars 32 forks source link

Contract violation of Equals #141

Closed TysonMN closed 2 years ago

TysonMN commented 3 years ago

I do not recommend implementing Equals using tolerances like is being done in ReducedLinearExpression.

https://github.com/fslaborg/flips/blob/896f4db64e3671d53a6d669552f3f404e2720bbc/Flips/Types.fs#L113-L116

This violates the contract for Equals. Specifically, implementations must be transitive. The documentation doesn't use that word. Instead, it directly expresses transitivity by stating%20returns%20true%2C%20then%20x.Equals(z)%20returns%20true.)

If (x.Equals(y) && y.Equals(z)) returns true, then x.Equals(z) returns true.

Even though Equals is supposed to be transitive, maybe there is a sufficiently good reason to break this rule for this type. I have not looked into the code that deeply yet to find this out for myself.

Suppose the transitivity of Equals is regained by removing the custom implementations of Equals and GetHashCode. Then I think bug #132 would be fixed.

matthewcrews commented 3 years ago

The reason for tolerances is due to float equality being painful. Expression which you think would be equal do not end up being equal due to float math. If you have an alternative, I would love to hear it.

It was meant provide structural equality which is the expectation in this domain. Reference equality is not what we are after but I'm guessing you already knew that 😊

TysonMN commented 3 years ago

An approximate equals function is certainly useful. The issue is combining that useful function with the already existing Equals function.

The easy workaround is to create a new function (say) ApproxEquals. Since it a new, you are not constrained by an existing contract; you can define the behavior as you like. I recommend having it accept a tolerance parameter.

As I think about this, I think there should be a contract violation with GetHashCode as well. If Equals returns true because the two instances are approximately equal, then GetHashCode for both instances should be the same. However, it seems the intention in the code was to continue using the previous GetHashCode implementation that worked correctly for structural equality. That argument isn't sufficient to show the existence of such a contract violation, but it suggests that there should be.

The reason for tolerances is due to float equality being painful. Expression which you think would be equal do not end up being equal due to float math. If you have an alternative, I would love to hear it.

It was meant provide structural equality which is the expectation in this domain.

It depends on how equality is being used.

Is it being used by the solvers? My guess is no.

Is it only being used in the tests? If so, then make the code generic and use MathNet.Numerics.BigRational from MathNet.Numerics.FSharp in the tests with structural equality and use float in the production code for the solvers. I am doing this in my application at work.

matthewcrews commented 3 years ago

It's for testing. It is not used by the solvers. I'm not sure how you would go about making the code generic to enable swapping between MathNet.Numerics.BigRational and float. That feels like it would be a rather large refactor.

The key thing is that LinearExpression which may have been composed in different orders but are mathematically equivalent, should be equivalent.

For example:

let x1 = Decision.createContinuous "x1" 0.0 infinity
let x2 = Decision.createContinuous "x2" 0.0 infinity

// This should be true
(x1 + x2) = (x2 + x1)

// This should also be true
(1.0 * x1 + 2.0 * x1) = (2.0 * x1 + 1.0 * x1)

This is being driven by the domain. Someone who has worked in the field of Mathematical Planning would expect these things to hold.

TysonMN commented 3 years ago

It's for testing. It is not used by the solvers.

Excellent. Just as I hoped :D

I'm not sure how you would go about making the code generic to enable swapping between MathNet.Numerics.BigRational and float. That feels like it would be a rather large refactor.

I don't know all the changes that would be involved. Even if many lines of code are changed, they would all be completely syntactic changes, which is conceptually simple.

It starts by replacing https://github.com/fslaborg/flips/blob/896f4db64e3671d53a6d669552f3f404e2720bbc/Flips/Types.fs#L17-L20

with

type DecisionType<'a> =
    | Boolean
    | Integer of LowerBound: 'a * UpperBound: 'a
    | Continuous of LowerBound: 'a * UpperBound: 'a

The keyword inline will be added to any function that works for any 'a.

matthewcrews commented 3 years ago

So this is interesting and it has me thinking. I know that @smoothdeveloper was wondering about using decimal instead of float. I'm pondering what the utility would be and how it would interact with the rest of Flips.

All Solvers that I am aware of think in float. Most of the research around the underlying algorithms presume the use of float for the sake of computation and the detection of drift. This means that the Solvers would need to know how to convert 'a to float.

I'm also wondering how this will interact with Units of Measure. UoM only works well with a subset of F# primitives.

TysonMN commented 3 years ago

This means that the Solvers would need to know how to convert 'a to float.

No. 'a would be float in code related to the solvers. No conversion is necessary.

I'm also wondering how this will interact with Units of Measure.

I don't think there is any impact on units of measure.

UoM only works well with a subset of F# primitives.

FSharp.UMX enlarges that subset.

TysonMN commented 3 years ago

The keyword inline will be added to any function that works for any 'a.

This means that the Solvers would need to know how to convert 'a to float.

No. 'a would be float in code related to the solvers. No conversion is necessary.

...because 'a would be a statically resolved type parameter in those functions.

matthewcrews commented 3 years ago

No. 'a would be float in code related to the solvers. No conversion is necessary.

Okay, the follow up question then is how to let the user know they need to use float without them spending a bunch of time formulating a problem before trying to call a solver only to find out they need to change the type? The goal is to guide the user toward success by constraining the options.

To be clear, I like the idea of allowing the code to be generic. I'm just balancing that with the new developer experience.

I too believe it will work with UoM. It just may require some additional refactoring.

TysonMN commented 3 years ago

I know that @smoothdeveloper was wondering about using decimal instead of float.

decimal still has finite precision, so if the property-based tests are any good, then they will find those cases that exceed the precision and cause structural equality to return false. In contrast, MathNet.Numerics.BigRational from MathNet.Numerics.FSharp has infinite precision.

I'm pondering what the utility would be and how it would interact with the rest of Flips.

The utility is that the property-based tests can verify all these properties like associativity and commutativity using exact / structural equality. At the same time, no performance is lost in production. The only downside is slightly more verbose code in some places. For example, some occurrences of 0.0 and 1.0 will be replaced with LanguagePrimitives.GenericZero and LanguagePrimitives.GenericOne respectively.

the follow up question then is how to let the user know they need to use float without them spending a bunch of time formulating a problem before trying to call a solver only to find out they need to change the type? The goal is to guide the user toward success by constraining the options.

There should be no impact on the developer experience either. Types like DecisionType are public, so changing it by making it generic might be a breaking change, but I can't think of code that would compile now that wouldn't compile afterwards.

Conceptually, the code would be grouped as follows.

matthewcrews commented 3 years ago

From the sound of that description, I like that approach. We've been working on pulling pieces apart to make them more modular and easier to evolve going forward.

smoothdeveloper commented 3 years ago

@TysonMN, @matthewcrews I think it should be interesting to see if float can be abstracted out without inducing too much pain in the library implementation and retaining the overall "ease of use" for client code, for the sake of exploring that space.

I'm personally comfortable with the FSharp.UMX extensions and the few primitives we would need to use in FSharp.Core to make this work rudimentarily.

On the other end, I'm not 100% clear if it brings much change for thing that can easily be dealt with in client code, because as I've seen discussed, the solvers all work with same float representation for input and output, we could leave it to client specific use case to move from that to something more specific to them.

When I was mentioning decimal, it was for those aspects:

I feel the other aspects that need to be ironed for the library evolution are more related to having the solver backend split and some possibilities to change representation of decisions in the model, would be my priority and safer area to invest time in terms of bang for the buck.

Regarding paket, it is a power tool, I'm personally more at ease having it but it won't affect my contributions if the repository doesn't make use of it.

For the sake of equality in DU, we could also have our own type Wrap(v: float) which has its own semantics and makes it possible to rely on the default implementation of it, so long we have the float comparison logic baked in Wrap.

@TysonMN I wouldn't mind if you could share a bit of how having support for generic numeric right now would change and make client code better, through code examples or things you've seen?

matthewcrews commented 3 years ago

In this domain, float is the dominant representation of numbers. I see the value in possibly abstracting this out but I also want to constrain Flips to focusing on the domain of LP and MIP. It should be easy and "intuitive" for those domains first and foremost.

I have difficulty working with Paket but I can be argued either way. My experience so far has mostly been pain but I also don't know what it is possibly protecting me against which could be worse. I don't have firm opinions either way. That is just my inexperience as a developer.

I perceive the biggest value adds at the moment are:

  1. Fixing the issue with GetHashCode
  2. Making equality behave correctly for the domain
  3. Adjust the types for Flips 3.0 to allow for the adding of features without breaking client code
TysonMN commented 3 years ago

I wouldn't mind if you could share a bit of how having support for generic numeric right now would change and make client code better, through code examples or things you've seen?

It won't make the client code better in any way (that I can foresee). It will make the test code better. Currently the test code is mostly comprised of beautifully designed property-based tests verifying that mathematical concepts like associativity and commutativity hold. However, the type of equality used in those tests to check that two expressions are the same uses a hardcode tolerance on floats. This is not ideal.

Instead, I am suggesting that the core code could probably be generalized without significant changes to client code (probably none) and no performance loss "in production" while allowing the test code to use structural equality to verify its properties.

I am doing this in my application at work right now. Part of our application includes a drawing editor that we implemented using SkiaSharp. Its API uses float32 everywhere. In some of my code, I needed to find the point at the intersection of two lines. To make sure that my code to compute this intersection is correct, I have a property-based test that computes the intersection of a horizontal and vertical line. That test uses MathNet.Numerics.BigRational from MathNet.Numerics.FSharp and structural equality to verify that the property holds. At the same time, the code "in production" that calls that functions does so using float32.

I have difficulty working with Paket but I can be argued either way. My experience so far has mostly been pain but I also don't know what it is possibly protecting me against which could be worse. I don't have firm opinions either way. That is just my inexperience as a developer.

Version 1.0.0 of Paket was released in 2015. That is also when I began my career as a software engineer. I was less knowledgeable then, so it is a bit difficult for me to elaborate, but the standard way to do NuGet packages management back then with the non-SDK-style project format was rather painful. My guess is that using Paket back then was better. In contrast, I find NuGet package management with the SDK-style project format to be easy.

smoothdeveloper commented 3 years ago

@TysonMN thanks, your experience with similar approach to replace float and make generic numeric operations is encouraging, I think it make a sense to try what such refactor would look like.

TysonMN commented 3 years ago

I have been trying for the last few hours. It is significantly harder that I expected. Comparing with my code at work, the main difference is that none of my operators used LanguagePrimitives.GenericZero or LanguagePrimitives.GenericOne. I don't see why that should matter though. I will try again another day.