Closed gelisam closed 9 years ago
Branch issue-106
adds a simple system of phantom types to ensure that the user expression is manipulated in a type-correct way. The approach is limited, however, by the fact that we don't know in advance what the type of the user expression will be: at first, it could be anything, then we inspect the command (Eval
, Apply
or Map
) and we learn a bit more, then we examine the InputFormat
and we learn the final details. I made a table (s
is ByteString
, a
is some instance of Rows
, and xx
means the combination is disallowed):
-d, | -D; | -D'' |
---|---|---|
-a | [[s]] -> a | [s] -> a |
-m | [s] -> a | xx |
-e | a | a |
-d'' | -D; | -D'' |
---|---|---|
-a | [s] -> a | s -> a |
-m | s -> a | xx |
-e | a | a |
But that's not how type inference works in Haskell: we can't gradually add constraints to the type of an argument as the values are examined, instead the argument has a particular concrete type, and each use of that argument adds zero or more constraints to that type. We cannot case-split on the flags to change the constraints we put on an argument, because all the branches of the case-split will add to the constraints, not just the one which ends up being followed.
I think there might still be a way to pull it off by using continuation-passing style...
I think there might still be a way to pull it off by using continuation-passing style...
I no longer think it's going to work in our case, but here's what I had in mind.
Consider a simpler case, with the following table of types:
--list | --tree | |
---|---|---|
--string | [String] | Tree String |
--int | [Int] | Tree Int |
For this simplified problem, we're not trying to apply a user expression to an input stream, but simply to verify if the user expression has any empty values, where the definition of "empty" varies depending on the table cell we're in. Let's start with an untyped version, similar to Hawk's current implementation:
checkExprEmptiness1 :: [String] -> String -> IO Bool
checkExprEmptiness1 flags expr = interpret' (printf "%s %s (%s)" _any _isEmpty expr) (as :: Bool)
where
_any | "--list" `elem` flags = "any"
_any | "--tree" `elem` flags = "anyLeaf"
_any | otherwise = error "need to use --list or --tree"
_isEmpty | "--string" `elem` flags = "null"
_isEmpty | "--int" `elem` flags = "(==0)"
_isEmpty | otherwise = error "need to use --string or --int"
If we try to use HaskellExpr to give a precise type to those strings representing haskell expressions, then we get a type error, even if we say that the user expression can have any type:
checkExprEmptiness2 :: [String] -> (forall e. HaskellExpr e) -> IO Bool
checkExprEmptiness2 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = eAnyAnywhere `eAp` eIsEmpty `eAp` eExpr
eAnyAnywhere :: HaskellExpr ((a -> Bool) -> f a -> Bool)
eAnyAnywhere | "--list" `elem` flags = eAny -- HaskellExpr ((a -> Bool) -> [a] -> Bool)
eAnyAnywhere | "--tree" `elem` flags = eAnyLeaf -- HaskellExpr ((a -> Bool) -> Tree a -> Bool)
eAnyAnywhere | otherwise = error "need to use --list or --tree"
eIsEmpty :: HaskellExpr (a -> Bool)
eIsEmpty | "--string" `elem` flags = eNull -- HaskellExpr ([a] -> Bool)
eIsEmpty | "--int" `elem` flags = eEq `eAp` eZero -- HaskellExpr (Int -> Bool)
eIsEmpty | otherwise = error "need to use --string or --int"
We get type errors because the type signatures for eAnyAnywhere
and eIsEmpty
claim to work for any types f
and a
instead of saying that an appropriate f
and a
will be determined at runtime. It sounds like existential types should be the answer, but I had to jump through hoops and even after that I couldn't get them to work. Something about "skolem" type variables...
Instead, let's use continuation-passing-style to precisely distinguish the part of the type which is computed by eAnyAnywhere
and eIsEmpty
from the part of the type which, from their point of view, could be anything. Conversely, from the point of view of eCheckEmptiness
, the part of the type which is computed by the helper functions is the part which could be anything.
checkExprEmptiness3 :: [String] -> (forall e. HaskellExpr e) -> IO Bool
checkExprEmptiness3 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = eAnyAnywhere $ \eAA ->
eIsEmpty $ \eIE ->
eAA `eAp` eIE `eAp` eExpr
eAnyAnywhere :: (forall f. (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
-> r)
-> r
eAnyAnywhere cc | "--list" `elem` flags = cc eAny
| "--tree" `elem` flags = cc eAnyLeaf
| otherwise = error "need to use --list or --tree"
eIsEmpty :: (forall a. HaskellExpr (a -> Bool)
-> r)
-> r
eIsEmpty cc | "--string" `elem` flags = cc eNull
| "--int" `elem` flags = cc (eEq `eAp` eZero)
| otherwise = error "need to use --string or --int"
The types are getting quite hairy, but it works! The most tricky part is that eAnyAnywhere
needs two nested forall
s with different meanings: the first forall
means that the continuation must be willing to accept any f
, whereas the second forall
means that the value which eAnyAnywhere
sends to this continuation is polymorphic in a
.
The above example worked fine, but only because the type table was very structured. Let's consider a slightly more complicated example.
--list | --tree | ||
---|---|---|---|
--string | String | [String] | Tree String |
--int | Int | [Int] | Tree Int |
This table is more complicated because the entries no longer all have the form f a
. This was an important feature, because eAnyAnywhere
was only affecting the type variable f
, while eIsEmpty
was only affecting the type variable a
. Can we divide the work even if there isn't one type variable per axis?
In this particular case, we can, at the expense of making the types even more complicated.
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = eAnyAnywhere $ \eAA ePreprocess ->
eIsEmpty $ \eIE ->
eAA `eAp` eIE `eAp` (ePreprocess eExpr)
eAnyAnywhere :: (forall f. (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
-> (forall a. (forall e. HaskellExpr e) -> HaskellExpr (f a))
-> r)
-> r
eAnyAnywhere cc | "--list" `elem` flags = cc eAny (\e -> e)
| "--tree" `elem` flags = cc eAnyLeaf (\e -> e)
| otherwise = cc eAnyIdentity (\e -> eIdentity `eAp` e)
My solution is to add an extra preprocessing step which forces the user expression to have the form f a
, by wrapping the user expression in an Identity
if the f
is missing, and otherwise leaving the user expression alone. I need to use explicit lambdas because id
doesn't handle forall
s very well.
The next step towards supporting a type table as complex as Hawk's is to introduce illegal cells. For example:
--list | --tree | ||
---|---|---|---|
Int | [Int] | Tree Int | |
--reverse | xx | [Int] | Tree Int |
I don't know how to handle this case yet, and in fact, I'm not sure I want to. Sometimes, it's better to have an untyped solution than to have a precisely-typed solution with incomprehensible types!
Let's figure out the precisely-typed-solution-with-incomprehensible-types anyway. For Science!
First, here is an alternate solution to the previous problem. Instead of forcing all the solutions into the same f a
shape, we now use an explicit datatype detailing the different possible shapes (f a
and a
) which isAnyAnywhere
can return, and which eCheckEmptiness
must therefore support.
data EmptinessChecker f
= ContainerChecker (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
| ValueChecker (forall a. HaskellExpr ((a -> Bool) -> a -> Bool))
checkExprEmptiness5 :: [String] -> (forall e. HaskellExpr e) -> IO Bool
checkExprEmptiness5 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = eIsEmpty $ \eIE ->
eAnyAnywhere $ \case
ContainerChecker eAA -> eAA `eAp` eIE `eAp` eExpr
ValueChecker eAA -> eAA `eAp` eIE `eAp` eExpr
eAnyAnywhere :: (forall f. EmptinessChecker f
-> r)
-> r
eAnyAnywhere cc | "--list" `elem` flags = cc (ContainerChecker eAny)
| "--tree" `elem` flags = cc (ContainerChecker eAnyLeaf)
| otherwise = cc (ValueChecker eId)
eIsEmpty :: (forall a. HaskellExpr (a -> Bool)
-> r)
-> r
eIsEmpty cc | "--string" `elem` flags = cc eNull
| "--int" `elem` flags = cc (eEq `eAp` eZero)
| otherwise = error "need to use --string or --int"
The cases for ContainerChecker
and ValueChecker
look identical, but the types of eAA
are different and thus eExpr
is instantiated at different types.
With such explicit datatypes describing the different possibilities, it should be no problem to detect incompatible combinations.
Suppose that both []
and Tree
are instances of a typeclass named Reversible
. We might want to add this constraint to the type signature of eAnyAnywhere
, so that eCheckEmptiness
can assume it.
eAnyAnywhere :: (forall f. Reversible f => EmptinessChecker f -> r)
-> r
eAnyAnywhere cc | "--list" `elem` flags = cc (ContainerChecker eAny)
| "--tree" `elem` flags = cc (ContainerChecker eAnyLeaf)
| otherwise = cc (ValueChecker eId)
Unfortunately, this triggers an error on the last line, because f
is ambiguous.
Interestingly, f
was also ambiguous in the previous example, which type-checked just fine. I think ghc usually doesn't care about ambiguous type variables, but that when a type constraint is present, now it does have to care because it has to produce a concrete dictionary instance to implement the constraint.
Existential types to the rescue! For some reason, they do work this time.
data ContainerChecker = forall f. Reversible f =>
ContainerChecker (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
data ValueChecker =
ValueChecker (forall a. HaskellExpr ((a -> Bool) -> a -> Bool))
checkExprEmptiness7 :: [String] -> (forall e. HaskellExpr e) -> IO Bool
checkExprEmptiness7 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = eIsEmpty $ \eIE ->
eAnyAnywhere $ \case
Left (ContainerChecker eAA) -> eAA `eAp` eIE `eAp` eExpr
Right (ValueChecker eAA) -> eAA `eAp` eIE `eAp` eExpr
eAnyAnywhere :: (Either ContainerChecker ValueChecker -> r)
-> r
eAnyAnywhere cc | "--list" `elem` flags = cc (Left (ContainerChecker eAny))
| "--tree" `elem` flags = cc (Left (ContainerChecker eAnyLeaf))
| otherwise = cc (Right (ValueChecker eId))
eIsEmpty :: (forall a. HaskellExpr (a -> Bool)
-> r)
-> r
eIsEmpty cc | "--string" `elem` flags = cc eNull
| "--int" `elem` flags = cc (eEq `eAp` eZero)
| otherwise = error "need to use --string or --int"
Instead of attaching f
to the EmptinessChecker
datatype and letting it be ambiguous in the ValueChecker
case, we split EmptinessChecker
into two subtypes. Again, there are two distinct uses of the forall
keyword: the forall f. Reversible f
says that there exists a reversible type f
which the ContainerChecker
can check, while the forall a
's say that the ContainerChecker
and the ValueChecker
are polymorphic in a
.
Now that the forall
's have been moved out of eAnyAnywhere
's type signature, there is no longer any need for continuation-passing-style. We can simplify the types a little bit!
data ContainerChecker = forall f. Reversible f =>
ContainerChecker (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
data ValueChecker =
ValueChecker (forall a. HaskellExpr ((a -> Bool) -> a -> Bool))
data IsEmpty = forall a.
IsEmpty (HaskellExpr (a -> Bool))
checkExprEmptiness8 :: [String] -> (forall e. HaskellExpr e) -> IO Bool
checkExprEmptiness8 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = case eIsEmpty of
IsEmpty eIE -> case eAnyAnywhere of
Left (ContainerChecker eAA) -> eAA `eAp` eIE `eAp` eExpr
Right (ValueChecker eAA) -> eAA `eAp` eIE `eAp` eExpr
eAnyAnywhere :: Either ContainerChecker ValueChecker
eAnyAnywhere | "--list" `elem` flags = Left (ContainerChecker eAny)
| "--tree" `elem` flags = Left (ContainerChecker eAnyLeaf)
| otherwise = Right (ValueChecker eId)
eIsEmpty :: IsEmpty
eIsEmpty | "--string" `elem` flags = IsEmpty eNull
| "--int" `elem` flags = IsEmpty (eEq `eAp` eZero)
| otherwise = error "need to use --string or --int"
Now all the complexity is at the top, inside the datatype declarations. And it's not even that complicated: it's some polymorphic type, wrapped in a HaskellExpr
type constructor. Some of the type variables are determined by the code which constructs the value, and some of the type variables are determined by the code which uses the value. The type variables for those two cases are bound by two forall
's with different semantics.
Or maybe the types are still complicated, but I have spent so much time on this problem that I finally have an intuition for them :)
We are finally ready to tackle the case of --reverse
:
data UserExpression =
UserExpression (forall e. HaskellExpr e)
data ContainerChecker = forall f. Reversible f =>
ContainerChecker (forall a. HaskellExpr ((a -> Bool) -> f a -> Bool))
data ValueChecker =
ValueChecker (forall a. HaskellExpr ((a -> Bool) -> a -> Bool))
data IsEmpty = forall a.
IsEmpty (HaskellExpr (a -> Bool))
data ContainerTransformer =
ContainerTransformer (forall f a. Reversible f =>
HaskellExpr (f a -> f a))
data ValueTransformer =
ValueTransformer (forall a. HaskellExpr (a -> a))
checkExprEmptiness9 :: [String] -> UserExpression -> IO Bool
checkExprEmptiness9 flags eExpr = interpretExpr eCheckEmptiness
where
eCheckEmptiness :: HaskellExpr Bool
eCheckEmptiness = case eExpr of
UserExpression eUE -> case eIsEmpty of
IsEmpty eIE -> case eAnyAnywhere of
Left (ContainerChecker eAA) -> case eTransform of
Left (ContainerTransformer eT) -> eAA `eAp` eIE `eAp` (eT `eAp` eUE)
Right (ValueTransformer eT) -> eAA `eAp` eIE `eAp` (eT `eAp` eUE)
Right (ValueChecker eAA) -> case eTransform of
Left (ContainerTransformer eT) -> error "--reverse needs a container"
Right (ValueTransformer eT) -> eAA `eAp` eIE `eAp` (eT `eAp` eUE)
eAnyAnywhere :: Either ContainerChecker ValueChecker
eAnyAnywhere | "--list" `elem` flags = Left (ContainerChecker eAny)
| "--tree" `elem` flags = Left (ContainerChecker eAnyLeaf)
| otherwise = Right (ValueChecker eId)
eTransform :: Either ContainerTransformer ValueTransformer
eTransform | "--reverse" `elem` flags = Left (ContainerTransformer eGenericReverse)
| otherwise = Right (ValueTransformer eId)
eIsEmpty :: IsEmpty
eIsEmpty | "--string" `elem` flags = IsEmpty eNull
| "--int" `elem` flags = IsEmpty (eEq `eAp` eZero)
| otherwise = error "need to use --string or --int"
Since we pattern-match on the four different cases, we can explicitly throw an error when invalid flags are combined. More importantly, if we do try to compute a value instead of throwing an error, we get a type error at compile time. After all, that's the whole point of using HaskellExpr
instead of String!
I think this is a duplicate of issue #40. Since this issue contains a lot more information than #40, should we close #40 instead?
Yes, let's continue here. I'm still busy with the thesis, I will read this thread when I have time. In the meanwhile, good luck :).
Ok, so back to the problem at hand. To recap, we have an untyped expression which we want to manipulate in a type safe way. I'm having a lot of trouble doing so, but why? What makes this problem so difficult?
I thought it was because the type of the user expression depends on the flags, but that's not it. The untyped expression can simply be given the type forall e. HaskellExpr e
, as above, meaning it can have any type we need it to have. Depending on the flags, we then specialize e
to be the type we need, for example [[ByteString]] -> Int
.
I then thought that the difficulty was caused by the interaction between the type restrictions imposed by the different flags. But that's not it either! If all the type restrictions are given in one place, then everything typechecks just fine:
-- using ordinary types instead of phantom types, for simplicity
process :: Action -> (forall e. e) -> InputFormat -> IO ()
process Eval e _ = print $ (e :: ())
process Apply e BareString = print $ (e :: String -> ()) bareString
process Apply e ListString = print $ (e :: [String] -> ()) listString
process Apply e ListListString = print $ (e :: [[String]] -> ()) listListString
process Map _ BareString = error "Map is incompatible with BareString."
process Map e ListString = print $ map (e :: String -> ()) listString
process Map e ListListString = print $ map (e :: [String] -> ()) listListString
The problem is that Hawk's codebase isn't as simple as the above anymore. The code which deals with the action flags is in System.Console.Hawk
, far from the code which deals with the input format flags, in System.Console.Hawk.Runtime
. This makes things more complicated, because the type restrictions now affect more than just one branch, they now need to be carried from one function to another, and for this they need to be represented in some form. This representation is what all the machinery in my previous comments have been about.
Maybe it would be simpler to refactor Hawk so that all the type restrictions would occur in the same function once again?
Hmm, turns out the forall e. HaskellExpr e
trick isn't as good as I thought.
I still wanted to try separating the action-flags part from the input-flags part, so I came up with the following intermediate representation for a user expression on which the action flags have already been applied.
data CanonicalExpression
= ProcessList (forall a. [a] -> HawkOutput)
| ProcessValue (forall a. a -> HawkOutput)
data HawkOutput = forall a. Rows a => HawkOutput a
With a few more tricks, I managed to implement a version of System.Console.Hawk.wrapExpr
which produces a CanonicalExpression
, and a version of System.Console.Hawk.Runtime.processTable
which accepts them. But then I still got type errors at runtime :(
First, the user expression produces a value of some type satisfying the Rows
constraint. In order to convert this user expression into a CanonicalExpression
, I need to wrap its output inside a HawkOutput
constructor. This is easy to do; but it's also easy to forget to do so.
go1 :: HaskellExpr (a -> HawkOutput)
go1 = eConst `eAp` (eHawkOutput `eAp` eExpr)
-- oops! forgot to apply eHawkOutput.
go2 :: HaskellExpr (a -> HawkOutput)
go2 = eConst `eAp` eExpr
Unfortunately, if eExpr
has type forall e. HaskellExpr e
, then go2
typechecks and go1
doesn't! It doesn't typecheck because the type at which e
should be instantiated is ambiguous. This is easily fixed with a type annotation, but the fact that go2
typechecks when it shouldn't is harder to fix. It typechecks because e
can be instantiated at e ~ HawkOutput
.
The other problem with the forall e. HaskellExpr e
representation is that the user expression doesn't really have all possible types, it has exactly one type, we just don't know what it is yet. If we just instantiate e
once at one particular type, then things are fine. With the above CanonicalExpression
representation, I tried to delay this decision until the input-flags information was in, but the forall a. a -> HawkOutput
forced the expression to support every possible input types. I only discovered my mistake much later, with a runtime error saying that [[ByteString]]
doesn't match a
.
Oh, I know what my mistake was! I added a new primitive to convert between (forall a. HaskellExpr (f a))
and HaskellExpr (forall a. f a)
, but perhaps this isn't a sound conversion after all.
History rewrite warning: I had to rebase this branch on top of master because it wasn't compiling with GHC 7.8.
I'm trying to remember the status of this issue. My comments seem to indicate that there is still a runtime error, but all the tests pass. Maybe I've fixed that bug already?
The current behaviour (on master) does not match the table at the top of this issue. An empty record separator implies that we don't split the fields either. So the first table should read:
-d, | -D; | -D'' |
---|---|---|
-a | [[s]] -> a | s -> a |
-m | [s] -> a | xx |
-e | a | a |
Note that the upper-right cell is now s -> a
instead of [s] -> a
.
Hmm, my comments mention a type called CanonicalExpression
, but I don't see that in issue-106
's history. Maybe I did all of the above work in some separate test file, which I never checked in?
Yup, looking at the commits for this branch, it looks like I've implemented the "simple system of phantom types" I mention at the beginning, but that the type of the user expression is always forall a. a
. That is, its type is not further restricted as each flag is processed.
Let's recap. This forall a. a
representation for the user expression is insufficient because we might think that we're instantiating it at a particular concrete type, such as [[B.ByteString]] -> ()
, but in fact we forgot some step or another and the forall
swallows the difference, leading to an error at runtime instead of compile time. The easiest way to resolve this would be to always use a type annotation to say at which type we intend to use the user expression, but there are a few complications.
First, the code which determines which type the user expression should have is distributed among many files, so there is no single point in the code at which enough information is known for the type annotation to be put. Second, the different decisions regarding that type interact with each other in ways which do not combine well at the type level. Third, some flag combinations are illegal, so there cannot be a single universally-quantified type which summarizes the results of earlier decisions, there must at least be one constructor for the failure case and one for the success case. Finally, the types I came up with to work around some of those problems are way too complicated.
So here's a tentative solution with very simple types! Let's represent the user expression by a record of Maybe
fields, one for each concrete type the user expression could have. At first, all the fields would be filled in, and as each flag is processed, some of the possibilities would be set to Nothing
. This way, each decision can be made independently, fmapping an implementation segment over those user expression types that they support, and setting the others to Nothing
. Once all the flags are processed, if there is exactly one Just
field left, then that's the one we should use, if there is more than one, then the flag combination was somehow ambiguous, and if there are none, then the flags were conflicting.
One limitation of this approach is that it cannot check at compile time whether there exist some flag combinations which are ambiguous (there shouldn't) or conflicting (there should).
It worked!!
There were more subtleties than I expected, but the transformation of the user transformation is now totally type-safe, and all the tests pass. I'm still a bit concerned about the safety of the part (now called canonicalizeExpression
) where the input is packed into a [[ByteString]]
and unpacked on the runtime side, but that is more an issue about totality than type-safety.
I have a few more ideas for improving upon this, so I'm not merging into master yet, but I've worked (on and off) on this for so long now, I'm just glad I finally found a working solution :)
My improvement ideas didn't work out, so I'll merge the code as is.
The improvement I was thinking of was to extend HaskellExpr
to support manipulating multiple alternate expressions (of different types) at once, that is, to create a generic framework instead of UserExpr
's one-time workaround.
I think it would be possible to build such a system, but using such a system would make our code harder to read. Instead of being explicit about what to do with all four cases, we'd have generic methods for dropping some of the cases and duplicating them, and those methods would be called in a point-free style which might obscure the intent. Furthermore, since the types of each of the cases would be distinct, users of the system would still have to specify four type-signatures, even if they're using libraries to hide the four implementations. And fmap
could not be used to lift a polymorphic function into the system, each such function would need a special version just for the system.
So let's keep things the way they are!
Hmm, this is embarrassing. Didn't cabal install --enable-tests
used to also silently run the tests?
Anyway, it turns out that cabal test
doesn't actually do anything on my machine (it says my version of cabal is wrong), and when I run them with cabal-1.18, I see that I actually broke the tests. Oops!
Phew! I only broke tests by removing an "unused import" which was actually used by a doctest. The tests do pass now, which means that my UserExpr
work was indeed correct.
wrapExpr
andapplyExpr
both construct alternate user expressions based on the original user expression. I remember we had discussed that there must be a safer way to do this, using phantom types or template haskell or whatever.