Consider changing how nullability is handled to more accurately model SQL

bitemyapp / esqueleto

New home of Esqueleto, please file issues so we can get things caught up!

BSD 3-Clause "New" or "Revised" License

374 stars 108 forks source link

Consider changing how nullability is handled to more accurately model SQL #165

Open tysonzero opened 4 years ago

tysonzero commented 4 years ago

I prefer the Haskell model of null, where Just Nothing and Just (Just Nothing) are different values, and where you use <$> and >>= to operate on the underlying value.

However with that said SQL does not take this approach, and thus trying to model nullability like the above is a very leaky abstraction. It also adds significant unneeded verbosity and a proliferation of functions that would otherwise be unnecessary.

One example downside is that calling functions/operators on nullable values can be quite painful, even when it would be quite painless in both SQL and regular Haskell:

a : boolean nullable
b : boolean nullable
--
c : boolean nullable
c = a or b

a :: Maybe Bool
b :: Maybe Bool
--
c :: Maybe Bool
c = (||) <$> a <*> b

The above cannot really be directly modeled in Esqueleto, so you typically have to do something weird like coalesce the Bool into non-nullable form and then operate on that, which is rather painful when the standard || semantics is desired.

Another example of the leakiness is that joinV is a no-op, which means even though isNothing and isNothing . joinV are very different in Haskell, they have the exact same semantics in Esqueleto.

One example of verbosity from this is the very common need to use joinV in situations where no extra syntax would be needed in SQL. Similarly just is also often needed in situations where no extra syntax would be needed in SQL.

My proposal to account for this is admittedly quite a large change, so it may be too backwards breaking to be accepted, but I do think it should be considered.

I think that SqlExpr should actually be parameterized by two types, the first representing nullability, and the second representing the actual type itself.

Functions that have defined semantics when interacting with null will then have their type generalized accordingly, for example (||.) :: SqlExpr n Bool -> SqlExpr m Bool -> SqlExpr (n \/ m) Bool and (^.) :: SqlExpr n (Entity v) -> EntityField m v t -> SqlExpr (n \/ m) (Value t).

Functions that crash completely when interacting with null can then simply require NotNullable at the type level. When serializing to and from Haskell, Nullable will naturally be mapped to and from Maybe.

parsonsmatt commented 4 years ago

I really like this idea but I'd want to see how it plays out in practice, and ensure that type inference etc still works out relatively nicely.

To elaborate a bit more, I really do agree with the premise here, and it's been quite annoying to do in a large codebase. I've been thinking about the "right" way to handle this, but all the solutions to automagically handle it seem a bit too magical to me.

belevy commented 4 years ago

I am not a huge fan of the extra type variable, I would prefer something that "just works" without breaking any old code. As it is most people probably have references to SqlExpr in their code bases (I know that we do) and so adding that extra type param would create giant friction for what may appear to be a small gain. Obviously we shouldn't paralyze the library for fear of changing the public interface but I feel like there is a happy medium between too much magic and too little.

Perhaps we could have type families

type family SqlNullableResult (inputs :: [*]) result where
  SqlNullableResult '[] r = r
  SqlNullableResult _ (Maybe r) = Maybe r
  SqlNullableResult ((Maybe _) ': _) r = Maybe r
  SqlNullableResult (_ ': inputs) r = SqlNullableResult inputs r

type family SqlNullableValue v where
  SqlNullableValue v = v
  SqlNullableValue (Maybe v) = Maybe v

With this we open up the possibility to write a function where the maybeness of the results are dependent on the inputs.

(&&.) :: (b1 ~ SqlNullableValue Bool, b2 ~ SqlNullableValue Bool)
      => SqlExpr (Value b1)
      -> SqlExpr (Value b2)
      -> SqlExpr (Value (SqlNullableResult '[b1,b2] Bool))

(?.) :: (PersistEntity val, PersistField typ) 
     => SqlExpr (Maybe (Entity val)) 
     -> EntityField val typ
     -> SqlExpr (Value (SqlNullableResult '[Maybe typ] typ))

This is of course a little more magical than passing around whether a thing is null but I feel that a library like esqueleto can bear this slightly more complex solution in order to prevent massive breaking changes.

All that said, I haven't done more than make sure that these types compile and pass the existing tests, it could prove that this approach is untenable but I personally feel that the cost of complexity is worth it.

belevy commented 4 years ago

After playing with this more a couple things I discovered.

(?.) would be served well by a type level Join that could be implemented as

type family Join (m :: * -> *) a where
   Join (m a) a = m a
   Join a a = a

Then we can just return SqlExpr (Value (Join Maybe (Maybe typ)))

Though it may just be nicer to make a Maybe specific type family

type family SqlMaybe v where
   SqlMaybe (Maybe v) = Maybe v
   SqlMaybe v = Maybe v

which would allow the return type to be SqlExpr (Value (SqlMaybe typ))

A SqlBaseType type family may be useful

type family SqlBaseType a where
   SqlBaseType (Maybe a) = a
   SqlBaseType a = a

This could be used in constraints to say Bool ~ SqlBaseType bool to allow for both Maybe and non maybe values. e.g.

(&&.) :: (Bool ~ SqlBaseType b1, Bool ~ SqlBaseType b2)
      => SqlExpr (Value b1)
      -> SqlExpr (Value b2)
      -> SqlExpr (Value (SqlNullableResult '[b1,b2] Bool))

I think the full dependent typing of the operators may be a bit overkill though, do we benefit by capturing the nullness of the results other than it being more accurate for modeling SQL?

The reason I ask is because during my experimenting I discovered that if we dependently type (==.) to potentially return a Maybe Bool you need to update a bunch of stuff including WhereClause and FromClause and all of the magic surrounding OnClause reordering. It seems like a whole bunch of work beyond just having the dependent type.

Final Thoughts: It seems to me that the main pain points are in dealing with types of Value (Maybe (Maybe a)) and with being forced to coalesce in order to use &&. and ||. Therefore I propose that rather than making everything behave according to the rules of SQL we just update any thing that returns a Value (Maybe typ) to instead return Value (SqlMaybe typ) which will prevent the nested maybe. and to update the operators that are specified in terms of SqlExpr (Value Bool) to instead be in terms of Bool ~ SqlBaseType bool => SqlExpr (Value bool) I feel like these two changes provide the maximum benefit without hurting type inference or being hugely time intensive to change. So ultimately the type signatures would be:

(&&.) :: (Bool ~ SqlBaseType bool)
      => SqlExpr (Value bool)
      -> SqlExpr (Value bool)
      -> SqlExpr (Value bool)

and

(?.) :: (PersistEntity val, PersistField typ) 
     => SqlExpr (Maybe (Entity val)) 
     -> EntityField val typ
     -> SqlExpr (Value (SqlMaybe typ))

tysonzero commented 4 years ago

There is a sqlType function in the PersistFieldSql class. Now it currently only works on the value level. However I think if it were lifted up into a type family and tagged with nullability it could be a lot more useful.

Then we would have:

(&&.) :: SqlType a ~ SqlBool n
      => SqlExpr (Value a)
      -> SqlExpr (Value a)
      -> SqlExpr (Value a)

This is pretty similar to what you have here, but could also be used for other purposes. It also would avoid any overlapping closed type families, which I personally prefer to avoid for extensibility purposes.

For example I would appreciate a function like:

cast :: SqlType a ~ SqlType b => SqlExpr (Value a) -> SqlExpr (Value b)

Currently I have to use veryUnsafeCoerceSqlExprValue for perfectly safe things like converting from a FooId to the underlying type.

belevy commented 4 years ago

So am I understanding correctly that your concern is that we will ultimately need a type level sqlType function anyways for something like cast to be typesafe and you don't want to have both that and this SqlBaseType?

tysonzero commented 4 years ago

More or less I suppose. I just think it's a nice solution to this problem and other related problems.

I also don't like the use of overlapping type families (even when closed).

belevy commented 4 years ago

Ok, so I hadn't heard the term "overlapping type families" before and thought you meant that the families would be duplicating logic. I think I understand now, I guess I am not familiar enough with type level programming to really see the downside of having the simple families that it allows. I was playing around with this and I don't think that lifting sqlType would actually be particularly helpful for cast, though it would allow a completely safe coerce function to be exposed(was this what you meant?). Here is a modified proposal that riffs on what you suggested while keeping the type families as simple as possible.

type IsSqlBool bool = 'SqlBool ~ SqlTypeLifted (Nullable bool)
(&&.) :: IsSqlBool a
      => SqlExpr (Value a)
      -> SqlExpr (Value a)
      -> SqlExpr (Value a)

The Nullable Type family is what was called SqlBaseType and then SqlTypeLifted is not aware of Maybe. I guess I am not fully understanding your nullable tag idea and how you would accomplish the implementation without the overlaps,I feel like this is a fairly happy medium where the Nullable type class should never have to be expanded(thereby alleviating concerns about maintaining it) since all it does is erase the Maybe for use in SqlTypeLifted.

EDIT: I wanted to see how a casting function could be made type safe. This is what I came up with:

type family IsMaybe t where
   IsMaybe (Maybe t) = 'True
   IsMaybe t = 'False

class SqlCastable (from :: SqlType) (to :: SqlType) where
instance SqlCastable 'SqlInt32 'SqlInt64 where
instance SqlCastable 'SqlInt32 'SqlReal where
instance SqlCastable 'SqlInt64 'SqlReal where

cast_ :: forall b a t t'.  ( t  ~ SqlTypeLifted (Nullable a)
                           , t' ~ SqlTypeLifted (Nullable b)
                           , IsMaybe a ~ IsMaybe b
                           , SqlCastable t t'
                           )
      => SqlExpr (Value a) -> SqlExpr (Value b)

Since the casts are on the sql side they are defined as legal in terms of the sql types.

Similarly coercion could be defined:

type SameSqlType a b = (SqlTypeLifted (Nullable a) ~ SqlTypeLifted (Nullable b), IsMaybe a ~ IsMaybe b)
coerce_ :: forall b a. SameSqlType a b => SqlExpr (Value a) -> SqlExpr (Value b)
coerce_ = veryUnsafeCoerceSqlExprValue

tysonzero commented 4 years ago

So we start with a library specifically designed around Sql and its type system, and not around Haskell. Both the nullability and the sql type are carried around and handled separately. The run function would of course call into an actual SQL DB rather than manually executing it.

module SqlEdsl where

import Data.Text (Text)

data Type
    = BoolT
    | IntT
    | TextT

data Nullability
    = Nullable
    | NotNull

data Expr (t :: Type) (n :: Nullability) where
    LitX :: Value t n -> Expr t n
    AndX :: Expr 'BoolT n -> Expr 'BoolT n -> Expr 'BoolT n
    EqX :: Expr a n -> Expr a n -> Expr 'BoolT n

data Value (t :: Type) (n :: Nullability) where
    NullV :: Value t 'Nullable
    BoolV :: Bool -> Value 'BoolT n
    IntV :: Int -> Value 'IntT n
    TextV :: Text -> Value 'TextT n

run :: Expr t n -> Value t n
run (LitX v) = v
run (AndX a b) = case (run a, run b) of
    (NullV, _) -> NullV
    (_, NullV) -> NullV
    (BoolV True, BoolV True) -> BoolV True
    (BoolV _, BoolV _) -> BoolV False
run (EqX a b) = case (run a, run b) of
    (NullV, _) -> NullV
    (_, NullV) -> NullV
    (BoolV x, BoolV y) -> BoolV (x == y)
    (IntV x, IntV y) -> BoolV (x == y)
    (TextV x, TextV y) -> BoolV (x == y)

justV :: Value t 'NotNull -> Value t 'Nullable
justV (BoolV x) = BoolV x
justV (IntV x) = IntV x
justV (TextV x) = TextV x

Next we provide a wrapper around that library to make it feel more like Esqueleto / Haskell, with things like Maybe instead of a nullability tag. Since it utilizes the previous library, we know it can't form ill-typed SQL queries.

module EsqlEdsl
    ( Sqlable, GetType, GetNullability
    , toValue, fromValue
    , Expr, CaseNullability
    , val, cast, (&&.), (==.), run
    , Type(BoolT, IntT, TextT)
    , Nullability(Nullable, NotNull)
    ) where

import Data.Text (Text)
import SqlEdsl
    ( Nullability(Nullable, NotNull)
    , Type(BoolT, IntT, TextT)
    , Value(NullV, BoolV, IntV, TextV)
    , justV
    )
import qualified SqlEdsl as S

class Sqlable a where
    type GetType a :: Type
    type GetNullability a :: Nullability
    toValue :: a -> Value (GetType a) (GetNullability a)
    fromValue :: Value (GetType a) (GetNullability a) -> a

instance Sqlable Bool where
    type GetType Bool = 'BoolT
    type GetNullability Bool = 'NotNull
    toValue = BoolV
    fromValue (BoolV x) = x

instance Sqlable Int where
    type GetType Int = 'IntT
    type GetNullability Int = 'NotNull
    toValue = IntV
    fromValue (IntV x) = x

instance Sqlable Text where
    type GetType Text = 'TextT
    type GetNullability Text = 'NotNull
    toValue = TextV
    fromValue (TextV x) = x

instance (Sqlable a, GetNullability a ~ 'NotNull) => Sqlable (Maybe a) where
    type GetType (Maybe a) = GetType a
    type GetNullability (Maybe a) = 'Nullable
    toValue Nothing = NullV
    toValue (Just x) = justV (toValue x)
    fromValue NullV = Nothing 
    fromValue (BoolV x) = Just (fromValue (BoolV x))
    fromValue (IntV x) = Just (fromValue (IntV x))
    fromValue (TextV x) = Just (fromValue (TextV x))

data Expr a where
    Expr :: S.Expr (GetType a) (GetNullability a) -> Expr a

type family CaseNullability (n :: Nullability) a b where
    CaseNullability 'Nullable a _ = a
    CaseNullability 'NotNull _ b = b

val :: Sqlable a => a -> Expr a
val = Expr . S.LitX . toValue

cast :: (GetType a ~ GetType b, GetNullability a ~ GetNullability b) => Expr a -> Expr b
cast (Expr x) = Expr x

(&&.) :: GetType a ~ 'BoolT => Expr a -> Expr a -> Expr a 
Expr x &&. Expr y = Expr (S.AndX x y)
infixr 3 &&.

-- constraints trivially true when type families not stuck
(==.) :: ( GetNullability (CaseNullability (GetNullability a) (Maybe Bool) Bool) ~ GetNullability a
         , GetType (CaseNullability (GetNullability a) (Maybe Bool) Bool) ~ 'BoolT
         )
      => Expr a
      -> Expr a
      -> Expr (CaseNullability (GetNullability a) (Maybe Bool) Bool)
Expr x ==. Expr y = Expr (S.EqX x y)
infixr 4 ==.

run :: Sqlable a => Expr a -> a
run (Expr x) = fromValue (S.run x)

Now we test it with some examples.

import EsqlEdsl

data Boolish = Falseish | Trueish

instance Sqlable Boolish where
    type GetType Boolish = 'BoolT
    type GetNullability Boolish = 'NotNull
    toValue Falseish = toValue False
    toValue Trueish = toValue True
    fromValue x = if fromValue x then Trueish else Falseish

main :: IO ()
main = do
    let n, m :: Int
        n = 2
        m = 7
    print . run $ val (Just n) ==. val (Just m)
    print . run $ val n ==. val n
    print . run $ val True &&. val False
    print . run $ val (Just True) &&. val Nothing
    print . run $ val True ==. cast (val Trueish)

Note how we have a working and safe cast function, as well as an ==. and an &&. that both handle NULL values in a reasonable way. The output we get is:

Just False
True
False
Nothing
True

Hopefully this gives you a bit of an idea of what I was trying to get at.

belevy commented 4 years ago

Ok I think I understand, the part I was missing was the underlying dsl vs public interface. Even still, it seems like an awful lot of work just to avoid the now 3 simple type families. I thought the whole point of closed type families was the overlapping bit.

In general it seems that one of the guiding principles of this library has been to really rely on Persistent as much as possible. The nullability tagged dsl feels like a departure from that heritage.

While my proposed approach is less rigorous and likely even less correct than the approach you outlined it feels to me more like a middle way. Of course I am incredibly biased since it's the approach I proposed :).

belevy commented 4 years ago

Here is my latest iteration, it's main benefit is that it minimizes the changes to esqueleto and the type level stuff can fit nicely into Persistent by merging the SqlTypeable(shown below) and PersistFieldSql classes. It still utilizes the overlapping type families but they really simplify everything and really bring down the code necessary as well as divergence from persistent in my opinion.

type family Nullable t where
  Nullable (Maybe t) = t
  Nullable t = t

type family IsMaybe a where
  IsMaybe (Maybe t) = 'True
  IsMaybe t = 'False

class PersistFieldSql t => SqlTypeable t where
  type SqlType' t :: SqlType
instance {-# OVERLAPPABLE #-} SqlTypeable t => SqlTypeable [t] where
  type SqlType' [t] = 'SqlString
instance SqlTypeable [Char] where
  type SqlType' [Char] = 'SqlString
instance SqlTypeable T.Text where
  type SqlType' T.Text = 'SqlString
instance SqlTypeable TL.Text where
  type SqlType' TL.Text = 'SqlString
instance SqlTypeable B.ByteString where
  type SqlType' B.ByteString = 'SqlString
instance SqlTypeable Html where
  type SqlType' Html = 'SqlString
instance SqlTypeable Bool where
  type SqlType' Bool = 'SqlBool
instance SqlTypeable Int where
  type SqlType' Int = 'SqlInt64
instance SqlTypeable Double where
  type SqlType' Double = 'SqlReal
-- The instances are the same as PersistFieldSql and the classes should probably be merged 

-- Constraint on runtime casting using the SQL CAST function (this would probably need to be defined seperately for each db)
class SqlCastable (from :: SqlType) (to :: SqlType) where
instance SqlCastable 'SqlInt64 'SqlInt32 where
instance SqlCastable 'SqlReal 'SqlInt32  where
instance SqlCastable 'SqlReal 'SqlInt64 where

-- Constraint for runtime coercion either due to using the same underlying type or having coercion supported without any issues in sql
class SqlCoercible (a :: SqlType) (b :: SqlType) where
instance SqlCoercible a a where
instance SqlCoercible 'SqlInt32 'SqlInt64 where
instance SqlCoercible 'SqlInt32 'SqlReal where
instance SqlCoercible 'SqlInt64 'SqlReal where

coerce_ :: forall b a t t'. ( t  ~ SqlType' (Nullable a)
                            , t' ~ SqlType' (Nullable b)
                            , IsMaybe a ~ IsMaybe b
                            , SqlCoercible t t'
                            )
        => SqlExpr (Value a) -> SqlExpr (Value b)
coerce_ = veryUnsafeCoerceSqlExprValue

cast_ :: forall b a t t'. ( t  ~ SqlType' (Nullable a)
                          , t' ~ SqlType' (Nullable b)
                          , IsMaybe a ~ IsMaybe b
                          , SqlCastable t t'
                          )
      => SqlExpr (Value a) -> SqlExpr (Value b)
cast_ = veryUnsafeCoerceSqlExprValue -- TODO: implement SQL CAST function here

tysonzero commented 4 years ago

Ok I think I understand, the part I was missing was the underlying dsl vs public interface. Even still, it seems like an awful lot of work just to avoid the now 3 simple type families.

So the underlying DSL doesn't actually need to exist. The important parts are the GetType and GetNullability type families. The underlying DSL is just a neat little proof of correctness.

I thought the whole point of closed type families was the overlapping bit.

When you are operating on Type or *, that is very often the usage yes. Which is why personally I avoid ever writing closed type families that operate on Type or *. I wish they weren't supported to be honest, similar to how we don't support runtime type introspection without a type class.

In general it seems that one of the guiding principles of this library has been to really rely on Persistent as much as possible. The nullability tagged dsl feels like a departure from that heritage.

As I mentioned the DSL was purely to help build out my proof of concept and prove its correctness. I think this could be done with a type family for nullability combined with making the existing sqlType function work on the type level.

While my proposed approach is less rigorous and likely even less correct than the approach you outlined it feels to me more like a middle way.

So the biggest correctness issue with the Maybe hardcoding stuff is that it will almost certainly have holes due to some non-Maybe types being able to output NULL.

It's similar to how you can have a perfectly reasonable ToJSON/FromJSON pairing on Foo, but if Foo can be serialized to null, the abstraction breaks when you start working with Maybe Foo.

This is why I went so far to build the underlying DSL. I really wanted to be confident that there weren't any holes in the approach.

belevy commented 4 years ago

So the biggest correctness issue with the Maybe hardcoding stuff is that it will almost certainly have holes due to some non-Maybe types being able to output NULL.

I don't understand what case that would be. All operations that can introduce NULL should output a maybe. Maybe is strictly the tag for this thing can be null now in the esqueleto EDSL as it stands today and has no runtime meaning. There is one hole today related to outer joins not enforcing the maybeness but #172 closes that hole. Actually the Maybe is really the default and without a maybe all we are saying is that nothing has occured to this value which could result in null. Maybe is the nullability flag in persistent and esqueleto as it stands and only serves that purpose in the language, if something can be null without being tagged Maybe that would be a bug since the assumption is already being made elsewhere.

All that said there is no reason why SqlTypeable can't have an associated Nullability type, it just seems superfluous to me. I'm sorry if my ignorance betrays me, I am rather new to type level programming having only started on this journey in December of last year.

I hope I'm not being obtuse, but what is the exact problem with closed families on *? It just seems like my approach utilizing Maybe directly instead of a nullable datakind is much more straightforward.

EDIT: Pushed changes based on this discussion so far to https://github.com/bitemyapp/esqueleto/pull/178

tysonzero commented 4 years ago

I don't understand what case that would be.

All operations that can introduce NULL should output a maybe.

Persistent doesn't enforce this though. There is nothing stopping a dev serializing to and from null for their custom type, and there may be good reason for doing so. Also as mentioned nested Maybe is already an example of the hole in incorrectly treating nullability as nestable.

I hope I'm not being obtuse, but what is the exact problem with closed families on *?

Specifically it's overlapping type families I dislike, and my dislike for it is the same reason I do like parametricity and purity.

When defining a custom data type I don't like it being prescribed meaning without me explicitly declaring as such. If I define data Option a = None | Some a for whatever reason, I don't expect it to magically work differently than Maybe because of a clever overlapping type family or type class that I wasn't aware existed.

It makes it harder for me to reason about my code, similar to how impurity and lack of parametricity make it harder for me to reason about my code.

belevy commented 4 years ago

Confusion around null has come up once again in #183. It seems that out joins enforcing nullability may exacerbate the poor handling of null. @tysonzero I fear that we are letting perfect get in the way of progress. @parsonsmatt do you have an opinion on my proposal to special case maybe with overlapping type families?

arthurxavierx commented 3 years ago

The approach used in rel8 looks quite promising for this: https://hackage.haskell.org/package/rel8-1.0.0.1/docs/src/Rel8.Schema.Null.html#Sql