gcanti / fp-ts

Functional programming in TypeScript
https://gcanti.github.io/fp-ts/
MIT License
10.84k stars 503 forks source link

Unlawful monad instance for Either. getValidation #1150

Open lepsa opened 4 years ago

lepsa commented 4 years ago

🐛 Bug report

Current Behavior

Currently the chain method for getValidation uses the Either instance, and this leads to conflicting semantics between chain and ap

Expected behavior

If there is a monad instance, the chain method must agree with the underlying applicative.

Reproducible example

import {
  getSemigroup,
  nonEmptyArray,
  NonEmptyArray
} from "fp-ts/lib/NonEmptyArray";
import { left, Either, getValidation } from "fp-ts/lib/Either";

const validation = getValidation(getSemigroup<number>());

// applicative `ap` and an `ap` derived from `chain` should agree.

const ap_ = (
  mab: Either<NonEmptyArray<number>, (_: string) => number>,
  ma: Either<NonEmptyArray<number>, string>
): Either<NonEmptyArray<number>, number> =>
  validation.chain(mab, ab => validation.chain(ma, a => validation.of(ab(a))));

const a: Either<NonEmptyArray<number>, (_: string) => number> = left(
  nonEmptyArray.of(1)
);
const b: Either<NonEmptyArray<number>, string> = left(nonEmptyArray.of(2));

validation.ap(a, b) == ap_(a, b); // These should be equal, but are not.

Suggested solution(s)

Remove the monad instance from validation, leaving it as an applicative with an alt/alternate instance. The current chain method can be exposed as a useful function, but should not be called a monad as it does not agree with the underlying applicative.

Additional context

Your environment

Which versions of fp-ts are affected by this issue? Did this work in previous versions of fp-ts?

The following is based on library source comments. I assume that this was not an issue in prior versions of fp-ts since the functionality wasn't added.

Software Version(s)
fp-ts >= 2.0.0
TypeScript >= 3.7.4
gwils commented 4 years ago

I'm a maintainer of Haskell's validation library which many use for this type.

I agree with this issue - validation is not a lawful monad because its chain does not agree with its ap. (see (<*>) = ap here)

This is a law violation and can lead to confusing behaviour.

gcanti commented 4 years ago

This is a law violation

This is debatable, why (<*>) = ap should be a law?

There's no 1:1 relation between monads and applicatives. Indeed you can derive two Applicative instances from a Monad instance:

  1. ap: (fab, fa) => M.chain(fab, f => M.map(fa, f)) (classic)
  2. ap: (fab, fa) => M.chain(fa, a => M.map(fab, f => f(a))) (actions are performed in reversed order)

So (<*>) = ap seems more like a convention in Haskell rather than a proper law, keep in mind that in fp-ts there's no 1:1 relation between data types and type class instances

can lead to confusing behaviour

Could you please provide an example?

endgame commented 4 years ago

Here is an example in Haskell. Note that when using ap to combine, the unlawful monad instance forces you to never consider some of the errors you could otherwise accumulate:

{-# LANGUAGE DeriveFunctor #-}

import           Control.Monad (ap)
import           Data.List.NonEmpty (NonEmpty)
import qualified Data.List.NonEmpty as NE

data Validation e a = Err e | OK a deriving (Functor, Show)

instance Semigroup e => Applicative (Validation e) where
  pure = OK
  Err e1 <*> Err e2 = Err (e1 <> e2)
  Err e1 <*> OK _ = Err e1
  OK _ <*> Err e2 = Err e2
  OK f <*> OK a = OK (f a)

-- UNLAWFUL! --
instance Semigroup e => Monad (Validation e) where
  OK a >>= f = f a
  Err e >>= _ = Err e

-- Test data
f :: Validation (NonEmpty String) (Int -> Bool)
f = Err (pure "f is bad")

a :: Validation (NonEmpty String) Int
a = Err (pure "a is bad")

-- Combines applicatively using (<*>)
-- Err ("f is bad" :| ["a is bad"])
spaceshipped :: Validation (NonEmpty String) Bool
spaceshipped = f <*> a

-- Unlawfully monadically combines using `ap`
-- Err ("f is bad" :| [])
apped :: Validation (NonEmpty String) Bool
apped = f `ap` a
mikearnaldi commented 4 years ago

I get the confusion in haskell where instances are resolved by the compiler but how does this example apply to fp-ts?

endgame commented 4 years ago

Because you can construct the same code, and have the same problem, like in the OP of this isse. If your code needs to validate a function and validate an argument to that function, and then combine those, using chain will drop errors on the floor.

It's violating something like the Liskov Substitution Principle. If all Xs are Ys, and I implement a behaviour of Y (the supertype - here, applicative) using only features from X (the subtype - here, monad), then I should still get the same result as if I implemented the behaviour of Y directly.

To do otherwise would be like having the functor map operation do different things if implemented in terms of ap or chain.

lepsa commented 4 years ago

@mikearnaldi Haskell typeclasses can be thought of as the compiler handling the dictionary passing for you, rather than having to do it yourself as in fp-ts. Both systems have the same issue where the compilers do not check that the code written for those instances/dictionaries follow the mathematical laws of the objects they encode, it is something you have to do yourself. I'm am ok with having multiple instance dictionaries for a given data structure, languages like haskell do a similar thing via type annotations that separate them in the type system, but not at runtime. An example that comes to mind are the various monoids like Sum and Product that work over numbers. However, within those instances I expect them to be coherent to themselves. Sum shouldn't have some of Product's semantics, and vice-versa. Bringing it back to the topic of this ticket, validation is doing either things, and that causes a breakdown in semantics as @endgame pointed out. I can no longer rely on the abstractions agreeing with each other in a dictionary, so I have to refer back to the implementation to know what the semantics are. either is available throughout a code base without any requirements and operates on the same type as, so I have free access to it's chaining semantics whenever I need them, but it should be a conscious decision I have to make when switching between those two worlds.

mikearnaldi commented 4 years ago

@lepsa I do know the haskell way :) the point is precisely the one you made, as long as you are forced to switch world you don't have the confusion in the case of fp-ts switching world would be picking the instance manually.

I think the point here is slightly more general and shall be seen in the relashionship between applicative and monad, while in general you can derive an applicative from a monad (a minimum of 2 actually given the simmetry) the applicative & the monad are still separated things, I don't fully agree that there is a subtyping relashionship between the 2 tho the liskov principle is not applicable.

In general theory I wasn't able to find any reference of this argument except from the haskell interpretation, for example:

So my interpretation, and I stress "interpretation", is that per se there is nothing "buggy" about this and the confusion is never raised in practice (I do use this a lot) while on the other side I fully agree in haskell this give raise to confusion

ajmcmiddlin commented 4 years ago

Let's ignore Haskell and ask what are some desirable properties of ad-hoc polymorphism. I argue that each dictionary of implementations for the abstractions supported by a type should be internally consistent. That is to say, if any two abstractions I have implementations for are related, the implementations should agree. Furthermore, every implementation of an abstraction should maintain the semantics encoded by that dictionary. My reasoning being that if the semantics within a dictionary are not consistent, it becomes much harder to work with and reason about my program. The difficulty comes from my inability to freely choose different abstractions within my dictionary without fear of unexpected departures from the intended behaviour.

In this specific case, when I use the validation dictionary I expect that the semantics of validation are maintained by every implementation of every abstraction in that dictionary. So regardless of whether I'm using Functor, Applicative, Monad, or anything else, I expect errors to accumulate. By providing a Monad instance that does not maintain these semantics, my confidence in working with these types and abstractions is eroded, and I must work harder to compensate. This is counter to the purpose of abstraction.

This has nothing to do with Haskell, and nothing to do with whether or not a function with the same type as chain is useful. The issue, at least as I see it, is with user expectations and consistency. I expect dictionaries to be consistent such that I can use the abstractions within them without being concerned they will do something unexpected (read: contrary to their stated semantics). As I understand it, this is the most important reason for maintaining consistency, which in practice is often achieved through stating laws that all instances should comply with. In this specific case, consistency can be maintained by removing the Monad instance from validation, knowing that if users desire that behaviour they can use the either dictionary instead.

endgame commented 4 years ago

By chance, this question came up on /r/haskell again: https://www.reddit.com/r/haskell/comments/fkbhmk/announce_validationselective_lightweight_pure/fkt6xlv/?context=5

Ed Kmett points out:

Without that law, traverse and mapM are meaningfully different functions, and then EVERY combinator you go to define with Applicative had darn well have a Monadic counterpart for all time, lest you get the wrong meaning of (<*>) when you wanted ap or vice versa.

Applicative being a superclass of Monad only works if the functionality is actually related.

anttih commented 3 years ago

I just recently introduced this bug into our codebase by using apSecond of TaskEither where I should have used the sequential version (which to my understanding is not so easy to use because its pipeable version doesn't seem to be exported).

I get the confusion in haskell where instances are resolved by the compiler but how does this example apply to fp-ts?

I acknowledge that I have some presumptions as a Haskell user but still I think it might be surprising for other people as well that by default the ap is parallell. What happened is I was using chain, but realized that I don't do anything with the result from the first computation so I went for the default point-free version which in my case was apSecond. Not once did I think it would do those in parallel and change the semantics completely.