davegurnell / checklist

Validation library for Scala.
Apache License 2.0
47 stars 11 forks source link

Custom Checked monad #10

Open davegurnell opened 6 years ago

davegurnell commented 6 years ago

I think we should change the default semantics of Checked. This thread is intended for that discussion. To provide more detail...

The semantics of Ior allow the user to map on the right hand side, regardless of what values are on the left hand side. Most rules in checklist allow the user to signal errors and retain a value on the right. This gives slightly odd semantics where error handling doesn't fail fast when the programmer expects it to.

@Jacoby6000 provided a workaround for this in #9 by adding "strict mode" checks that return an Ior.left on failure instead of an Ior.both. However, I still think this is too fiddly. IMO a developer should be able to choose failure semantics up front and not have to worry about them afterwards. See #2 for thoughts on allowing abstraction of the Checked type.

Regardless of whether we can abstract over different implementations of Checked, we still need to choose a default failure semantics. These are the properties I believe most developers (myself included) will expect:

The problem with this is that flatMap and ap have inconsistent semantics. This is something that Cats and Scalaz fundamentally avoid. However, I believe that in the case of checklist, a non-rules-compliant custom Checked monad will be a reasonable thing to create.

What do people think? Is this an ok idea? Is it an ok idea if we allow users to swap out Checked for Either or Ior as suggested in #2?

Jacoby6000 commented 6 years ago

I've toyed with the idea of making a type class generalizing over Either, Xor, \/, etc; Wound up not being useful for the cats repo, but we might be able to use it in checklist to let users pick their effects.

https://github.com/typelevel/cats/pull/976 for reference. Could update it to support inclusive types as well.

ashleymercer commented 6 years ago

I've actually just bumped up against this exact problem. I wonder if Checked actually lives somewhere between Validated and Ior? On its own, Validated is too weak since you can't have both messages and a value; but Ior is too strong since it enforces fully monadic behaviour.

Would it be fair to say that Checked is really something like this:

type Checked[T] = NonEmptyList[WarningMessage] Ior Validated[NonEmptyList[ErrorMessage], T]

I think the above problem above stems from the fact that, by smooshing the hard and soft failures together, you can't really enforce at the type level whether the messages on the left are a hard failure (therefore throw the value T away) or only warnings (so it's safe to use the value).

By separating these out you can keep the existing monadic behaviour of Ior, although you might provide some convenience methods on Checked otherwise you're going to end up with a lot of code like:

val foo: Checked[T] = ...
val bar = foo map { _ map { t: T => ... }}

since we need to map through both the Ior and the Validated

ashleymercer commented 6 years ago

Actually, a couple of other nice benefits fall out of this: it's no longer necessary to have separate ErrorMessage and WarningMessage types, since the status of a message as a hard or soft failure is signalled by its position in the type.

This then makes it trivial to implement custom message types (#2):

type CheckedCustom[M, T] = NonEmptyList[M] Ior Validated[NonEmptyList[M], T]

// Obviously Message is a bit more complex than this...
case class Message(text: String, path: Path)
type Checked[T] = CheckedCustom[Message, T]

(Although you'd need some way of passing in a converter (String, Path) => M, possibly repurposing the existing ToMessage trait?

Jacoby6000 commented 6 years ago

@ashleymercer Your observation is mind-bogglingly obvious in hindsight.

I think this is a great idea.

Jacoby6000 commented 6 years ago

I think it would be good for this to be it's own type, with conveniences for converting in to an Ior/Validated composition. As it's own type, it'll be less cumbersome and we'll be able to define semigroups/monad instances without much hassle from overlap.

sealed trait Checked[E, W, A]
case class Errored[E, W, A](errorMessages: NonEmptyList[E], warnings: List[W]) extends Checked[E, W, A]
case class Warned[E, W, A](warnings: NonEmptyList[W], result: A) extends Checked[E, W, A]
case class Succeeded[E, W, A](result: A) extends Checked[E, W, A]

Maybe throw in some variance to get rid of some type params, maybe not.

From here you just build typeclass instances to behave how you'd expect an A Ior (Validated[B, C]) to behave.

Maybe E and W should be unified, but I think it's useful to separate them out. Might have to make a trifunctor and tritraversable to make them easier to work with.

Jacoby6000 commented 6 years ago

If we do wind up with trifunctor, tritraverse, then #3 will require us to add a module for scalaz support, since shims will not cover our custom typeclasses.

ashleymercer commented 6 years ago

I think it would be good for this to be it's own type

Now that I think about it more, you're right - removing the risk of clashing with existing typeclasses is very useful, and we can easily map back to the "vanilla" cats types if needed.

Maybe E and W should be unified, but I think it's useful to separate them out

Again, I agree - the obvious example is that you're going to want your error type to hold some exception or stack trace, but the warning type is probably just a String.

Maybe throw in some variance to get rid of some type params, maybe not.

I think at least +A should be covariant (similar to Ior). Message types I'm on the fence about - I'd probably declare a single error type upfront for my whole system; but I can see that others might want an ADT of message types (ErrorString, ErrorWithException etc)

ashleymercer commented 6 years ago

Abstractions on abstractions: had a bit more of a think about this and maybe it makes sense to introduce a Trior[A, B, C] type (Tri + Ior - geddit?) which is equivalent to

type Trior[A, B, C] =:= Either[Either[A, (A, B)], Either[(B, C), C]]

Trior could be defined independently from Checked, but with the semantics we want (right-bias for values but with accumulation of warnings in B). Checked is then simply:

type Checked[T] = Trior[NonEmptyList[Message], NonEmptyList[Message], T]
Jacoby6000 commented 6 years ago

@ashleymercer There's actually a ton of these isomorphisms, I'm discovering.


  def eithers: Either[Either[E, (E, W)], Either[(W, R), R]] =
    fold(
      (e, w) => w.fold(Left(Left(e))(Left(Right((e, w))))),
      (w, result) => Right(Left((w, result)))
      result => Right(Right(result))
    )

  def eitherOption: (Option[W], Either[E, R]) =
    fold(
      (err, warn) => (warn, Left(err)),
      (warn, result) => (Some(warn), Right(result)),
      result => (None, result)
    )

// this one isn't an iso, but whatever. it could be nice
  def options: (Option[E], Option[W], Option[R]) =
    fold(
      (err, warn) => (Some(err), Some(warn), None),
      (warn, result) => (None, Some(warn), Some(result)),
      result => (None, None, Some(result))
    )
davegurnell commented 6 years ago

I spent a little time on an alternative encoding of Rule based directly on Kleisli. The code is in the feature/kleisli branch.

In this codebase, the checklist.core package is error-handling-monad-independent. Users can integrate it with any monad via a type class (confusingly called Checked). This type class specifies the warning/failure/prefixing behaviour we need to model soft errors and error locations.

I'm interested in trying different encodings, for example on top of @ashleymercer's trior type above, to see if they create useful alternative sets of semantics.

ashleymercer commented 5 years ago

Revisited this again (!) and realised there's an alternative, even simpler encoding we might use:

sealed trait Checked[E, W, A]
case class Failed[E, W](errors: E, warnings: Option[W]) extends Checked[E, W, Nothing]
case class Succeeded[W, A](warnings: Option[W], value: A) extends Checked[Nothing, W, A]

This encoding has several advantages:

On the last point: with a (very simplified) Ior-based implementation I found myself doing a lot of stuff like this whenever I wanted to get out of a "checked" context:

checked match {
  case Ior.Left(errors) =>
    errors.foreach(log.error)
    System.exit(1)

  case Ior.Both(warnings, value) =>
    warnings.foreach(log.warn)
    businessLogic(value)

  case Ior.Right(value) =>
    businessLogic(value)
}

so you end up repeating business logic in both branches (or else factoring out to a separate method, but you still have to call it in both branches). With a simple two-case match it looks much cleaner:

checked match {
  case Failed(errors, warnings) =>
    errors.foreach(log.error)
    warnings.foreach(_ foreach log.warn)
    Sys.exit(1)

  case Succeeded(warnings, value) =>
    warnings.foreach(_ foreach log.warn)
    businessLogic(value)
}

I've only got a very preliminary prototype so far, will do a bit more digging and maybe throw up a gist or something.

ashleymercer commented 5 years ago

Aaand of course I've hit an immediate snag: using cats.Semigroup[Option] to accumulate warnings doesn't work, because the default behaviour is to only combine if both values are Some - i.e. if I only have warnings in one of the objects, then the combination throws the warnings away.

Asking people to use a custom Semigroup[Option] instance is almost certainly a non-starter. Maybe we just remove all knowledge of the possible emptiness of W altogether (this is similar to what cats.Validated does) and just have:

case class Failed[E, W](errors: E, warnings: W) extends Checked[E, W, Nothing]
case class Succeeded[W, A](warnings: W, value: A) extends Checked[Nothing, W, A]

with the caveat that they should use a possibly-empty-but-accumulating type like List for warnings?

davegurnell commented 5 years ago

Hi! Thanks for working on this. My first inclination (without giving it too much thought) is along the same lines. The types should be just E and W. If the user wants to make either an Option, List, NonEmptyVector, or whatever, they can. They just need to provide a Monoid. That way we avoid getting into these problems.

And I like your encoding with Failed and Succeeded too. It's simple but captures the right semantics. I'd call them Pass and Fail but that's just 'cos I'm lazy and don't like typing :D

dispalt commented 5 years ago

If you do SemigroupK[Option] doesn't it behave more like first one set? Aka it doesn't rely on W itself being a monoid.

ashleymercer commented 5 years ago

Here's a very rough-and-ready implementation: https://gist.github.com/ashleymercer/08aeb8f1d6c9e0837c751ce992ffe65a

I haven't remotely touched any of the Rule stuff yet (my "validation" is some very simple matching), might have a go at this over the weekend.

As desired, warnings are generally preserved as long as possible; errors obviously can't be (m)ap'ped over, but can be combined.

Since Checked is entirely agnostic about message types, it doesn't contain any of the Path stuff from the current implementation of checklist: however, I think it might be considered an improvement to allow this to be optional.

Type inference seems to work pretty well (I've followed the rough layout of cats for implicit resolution etc) but since we only have two case classes and the warning type W must always be filled in, the compiler has a tendency to find W based on any available Monoid[W] (and worse, having more than one implicit Monoid available causes conflicts). You can work around this by specifying the types explicitly on the .pass() and .fail() methods:

NonEmptyList.one("This is a fatal error").fail[List[WarningType]]

Alternatively, you can specify the type of the final result, and let the compiler work downwards:

val res: Checked[NonEmptyList[String], List[String], Foo] = ...

I'm not sure there's much we can do to avoid this tax.