Open noughtmare opened 3 years ago
Proof of concept passing information from one module to another:
-- QQ.hs
module QQ where
import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.TH.Syntax
qqtest = QuasiQuoter
{ quoteExp = compile
, quotePat = notHandled "patterns"
, quoteType = notHandled "types"
, quoteDec = notHandled "declarations"
}
where notHandled things = error $ things ++ " are not handled by the qqtest quasiquoter."
compile :: String -> Q Exp
compile str = do
Just x <- lookupValueName "test"
x' <- reify x
case x' of
VarI n (AppT _ (LitT (NumTyLit x))) _ -> return (LitE (IntegerL x))
-- Source.hs
{-# LANGUAGE DataKinds #-}
module Source where
import GHC.TypeLits
import Data.Proxy
-- This type can contain arbitrary data and it could have been generated by a quasi quoter.
test :: Proxy 1337
test = Proxy
-- Target.hs
{-# LANGUAGE QuasiQuotes #-}
import QQ
import Source
main :: IO ()
main = print [qqtest|abc|]
Running it will print the expected: 1337.
I have been able to make a prototype: https://github.com/noughtmare/uuagc/tree/template-haskell
You can try it out by cloning and compiling this test project: https://github.com/noughtmare/test-uuagc-th
It contains the code:
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
module Main where
import UU.UUAGC (uuagcTH)
import Text.RawString.QQ
uuagcTH ["haskellsyntax","data"] [r|
data Tree
| Node left :: Tree
right :: Tree
| Tip value :: Int
deriving Tree : Show
|]
main :: IO ()
main = print (Node (Tip 1) (Node (Tip 2) (Tip 3)))
If everything goes well, that will compile and produce the string:
Node (Tip 1) (Node (Tip 2) (Tip 3))
It is currently limited to only generate data types.
I have noticed that I was working on the wrong file. ExecutionPlan2Hs
is only used when loag or kennedy warren is enabled. Instead it is much easier to start by rewriting the PrintCode
module. I have started doing that and now I already have a version that can compile a small attribute grammar including data, catas, semfuns and wrappers (https://github.com/noughtmare/uuagc/commit/0937ffb084e4e8cf51894a36614dc4eeb7b9785d). I have also updated the test-uuagc-th
repo so that it can now compile and run this more complete example:
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where
import UU.UUAGC (uuagcTH)
import Text.RawString.QQ
uuagcTH ["haskellsyntax","data","signatures","catas","wrappers","semfuns"] [r|
data Tree
| Node left :: Tree
right :: Tree
| Tip value :: Int
attr Tree
syn max :: {Int}
sem Tree
| Node lhs.max = max @left.max @right.max
| Tip lhs.max = @value
deriving Tree : Show
|]
main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
I have now made a bigger proof of concept for sharing information across modules with template Haskell: https://github.com/noughtmare/splitfuns.
I have now been able to implement a similar modularity system with my uuagc TH fork, here is an example of it in action: https://github.com/noughtmare/test-uuagc-th
There is one file Data.hs
which contains the data definitions:
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Data where
import UU.UUAGC (uuagcTH)
import Text.RawString.QQ
uuagcTH "dat" ["haskellsyntax","data"] [] [r|
data Tree
| Node left :: Tree
right :: Tree
| Tip value :: Int
deriving Tree : Show
|]
And the Main.hs
file imports that and defines semantics for it:
{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where
import UU.UUAGC (uuagcTH)
import Text.RawString.QQ
import Data
uuagcTH "sem" ["haskellsyntax","signatures","catas","wrappers","semfuns"] [dat] [r|
attr Tree
syn max :: {Int}
sem Tree
| Node lhs.max = max @left.max @right.max
| Tip lhs.max = @value
|]
main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
Awesome! I'll have to find some time some day to dig into this, but this looks like a good start. I think the whole Template Haskell approach has a few challenges:
uuagc
so that we have 1 code generation path (or probably 2, as we already had 2 different forms of code generation). And I imagine using -dth-dec-file
to solve the bootstrapping problem of uuagc
itself, we should be able to include the generated Haskell source in the distribution in that way..ag
files. Thus we'll have to do some parsing to reconstruct a TH tree from the expression, which I believe may be much harder than it may sound, and sometime lead to confusing error messages. I am not sure how you solved that in the above example, maybe this assumes simple function applications only.Sharing code generation
Concretely, I think this would mostly mean replacing the use of the data structures in Code.ag with TH data types, or do you mean something else?
Edit: I see now that also things like HsToken could be replaced by Template Haskell throughout the whole compilation process.
I don't think it is possible to fully merge the compilation paths, because things like includes, import blocks and sepsemmods are just so different or unnecessary when using template Haskell.
I think the Clean and OCaml code generation would also be a problem.
Quoting / parsing of expressions
For this I have used the haskell-src-meta
package. It uses haskell-src-exts
to parse the text and then converts the results to TH types. But it doesn't support everything yet and I think haskell-src-exts
is also not always exactly the same as GHC. If we want full support then we need to use ghc-lib-parser
and/or convince/help GHC developers to add something like this nested bracket merge request. Because I think antiquotation is not currently supported.
Update: I have now also made a package that uses GHC's own parser to parse Haskell code into Template Haskell types, see: https://github.com/UU-ComputerScience/uuagc/issues/10#issuecomment-822821613. And yet another way is to use the more granular embedding, see: https://github.com/UU-ComputerScience/uuagc/issues/10#issuecomment-823334968.
Composability and sharing the definitions between modules
I must say that it is a bit fragile with respect to the capturing of variable names. Right now it doesn't capture anything, so you'd have to import all the required imports in the module where you generate the final code instead of the module where you use the variables. We can piggyback on GHC by using a more granular syntactic embedding, this is my artist's rendition of what that could look like:
{-# LANGUAGE TemplateHaskell, QuasiQuotes, QualifiedDo, BlockArguments, DuplicateRecordFields, NoFieldSelectors #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where
import UU.UUAGC
data Tree
= Node { left :: Tree
, right :: Tree
}
| Tip { value :: Int }
deriving Show
-- register the Tree type with the system
data' ''Tree
-- register the max attribute
-- the namespace of these attributes should actually be completely separate from everything else.
attr ''Tree Attr.do
syn "max" [t| Int |]
-- define the semantic rules that calculate the max attribute
sem ''Tree Sem.do
-- I'm not very happy yet about how these splices look.
case' 'Node (lhs % "max" .= [| max $('left % "max") $('right % "max") |])
case' 'Tip (lhs % "max" .= val 'value)
main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
But even then there are bugs like this: https://gitlab.haskell.org/ghc/ghc/-/issues/19662. And I think it really adds too much clutter to the syntax.
maybe naming the module or using fqns would be better
I am now thinking that I could just always use the name "ag" and then users can import the whole module qualified, so you would get something like Data.ag
.
I must say that it is much easier to use a Haskell variable to store the module in, because while you can reify
strings to variables, you cannot get the value of that variable. You can still get the type, so a workaround is to serialize the data into a type like I do in the auto
branch of the splitfuns
prototype: here. There I store the data in a type variable name (which is pretty crazy), a slightly better way would perhaps be to use a type level integer, but that requires the DataKinds
extension and is not that much better.
I just did some testing with using the ghc
(or ghc-lib-parser
) package for parsing strings into Template Haskell types, here's the result: ghc-meta
. It is doable, but doing everything correctly and keeping up with GHC releases might become a problem.
I actually think a more granular syntax might be acceptable, I've come up with this:
{-# LANGUAGE TemplateHaskell, QuasiQuotes, DuplicateRecordFields #-}
module Main where
import UU.UUAGC
data Tree
= Node { left :: Tree
, right :: Tree
}
| Tip { value :: Int }
deriving Show
attr ''Tree
[ syn "max" [t|Int|]
]
sem ''Tree
[ alt 'Node [d|[a|lhs.max|] = max [a|left.max|] [a|right.max|]|]
, alt 'Tip [d|[a|lhs.max|] = [a|value|]|]
]
main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
The lists can perhaps be made slightly more nice with QualifiedDo
or RebindableSyntax
as in my previous artist's rendition, but I think this is already relatively nice.
The huge advantage of this is that we don't need haskell-src-exts
or ghc-lib-parser
, because we can just use the built-in Haskell quoters.
How do you feel about that?
Edit: If we use '
instead of .
to separate field from attribute name and if we do some name resolution magic, then we can probably get rid of the a
quasiquoter:
sem ''Tree
[ alt 'Node [d| lhs'max = max left'max right'max |]
, alt 'Tip [d| lhs'max = value |]
]
Then we can do some post processing to rename the variables appropriately. Of course the disadvantage is that you can no longer use normal Haskell variables/functions with an apostrophe, but we could just use a double apostrophe as escape sequence.
Update: the biggest problem I see now is that there isn't a nice syntax for extending data types with additional cases using this more granular approach.
Sorry for the slow replies, I wanted to spend some time on this but it does not work out.
Regarding the nice syntax: I think that is a different project. My main motivation for having some TH support is keeping uuagc
alive. The cabal-plugin is harder and harder to deal with, but there are some projects that depend on it, so it would be great if there is a small manual refactor so that TH can be used with uuagc, which seems doable.
a small manual refactor so that TH can be used with uuagc, which seems doable
It is doable, but some features like OCaml and Clean code generation (obviously TH cannot be used in OCaml or Clean) and sepsemmods (TH cannot create modules or even imports and exports) cannot be used with template haskell. So, I don't know if it will ever be a full replacement for the cabal plugin. However, I do have hopes that TH will allow us to publish attribute grammars to Hackage which seems like a big win (although it will not really be modular like AspectAG), because all the grammar pieces will still need to be collected and compiled together.
Another alternative is to follow the approach of https://github.com/ElderEphemera/instance-impl which is a GHC plugin that desugers its nonsensical syntax to produce a valid program. The only requirement then is that the program must be accepted by GHC's parser. This requirement is not so strict, e.g. this parses:
data Tree
= Nil
| Node (l :: Tree) (x :: Int) (r :: Tree)
attr Tree
syn use (+) (const 0) (sum :: Int)
sem Tree
| Nil lhs.sum = 0
| Node +sum = (+ x)
The data
declaration is parsed as a normal data declaration with type variables and kind signatures. The attr
declaration is parsed as a top-level TH splice where the attr
function is applied to many arguments. The sem
declaration is parsed as a function definition with guards. It might be possible to desugar all of these to our desired output.
I have tought a bit about using template haskell with uuagc.
Initial working version
First we could use a file quasi quoter (made with the quoteFile function) to load and compile an ag file.
Options can be passed to this quasi quoter by defining a specific variable (say
agOptions
) as, for example, a certain string constant. This variable can be looked up by the quasi quoter and the string constant can be read.Edit: an annoyance is that due to the stage restriction you cannot lookup variables that are declared in the module itself, so you need to put this configuration variable in a separate module.
Edit2: Maybe we can avoid the stage restriction by using
addModFinalizer
in combination withaddTopDecls
. I wonder if those work together.Edit3: No,
lookupValueName
works inaddModFinalizer
, butreify
doesn't. Instead maybe we could just read a file with an IO action, maybe even the module itself (we can get the file name withloc_filename <$> location
) and store the options in a comment?Edit4: We can work around the whole problem by using plain TH functions instead of quasi quoters:
What is also required is a way to convert an execution plan to template haskell. It should be pretty straightforward, copy over ExecutionPlan2Hs.ag to ExecutionPlan2TH.ag and rewrite all the functions to generate template haskell declarations.
Edit: I just noticed that there is
ExecutionPlan2Hs.ag
andPrintCode.ag
which both function as the final step to Haskell code. The execution plan is used when loag or kennedy warren is enabled, and the code is used otherwise. I started with rewriting the execution plan printer, but now I realize I should probably start with the code printer.A difficult part will probably be the conversion of embedded haskell snippets to template haskell data types. The haskell-src-meta package is probably our best bet.
The suggested plan up to this point seems to me like it should be enough to build single file grammars, but I don't think it will be able to handle grammars that include other grammars.
A more modular solution
To go even further we could investigate using the Haskell module system instead of the ad-hoc #include system to make the attribute grammars more modular (and possibly allow importing attribute grammars across Haskell packages.
The idea is to save generated attribute grammar information in special deterministically named variables that are unique for each module and uuagc version perhaps. Any quasi quoter can lookup this variable which could simply store all necessary information in one big string literal, but perhaps we can do better.
To get such unique names we could use a hashing function of the module name and uuagc name and version. That can then be base64 encoded with the characters A-Z, a-z, 0-9, and _ and ' which are legal characters in variable names. Using a good hash function should ensure that no person can accidentally use the same variable name while still ensuring that this name can be generated by quasi quoters in other modules.
Question: how to make sure that that unique variable is always exported?
Answer: We could make a special template haskell splice that takes a list of imports as input and adds the special variable to that list.It is not possible to generate an export list (or even an import list) with Template Haskell.This is actually a pretty important point. We can just write these imports in the Haskell module that contains the template haskell expression to generate the attribute grammar code. But if we want to import semantic functions from other modules then we need to make sure that all the Haskell code in those semantic modules has the right imports. A solution might be to bind each Haskell code block to a Haskell variable which uses imports from the module that it is defined in. Those variables can then be exported and imported in other modules and the names can be transferred via the type literal trick as described in the comment below. I think we can require users to use implicit exports so that everything is automatically exported.
Extras
If this system if successfully implemented then it might also make sense to implement template haskell functions that can automatically derive attribute grammar nonterminals from existing data types. That would make it much easier to integrate attribute grammars into existing projects.
Update: I just found out that this would be similar to the Haskell2AG tool that is in the
tools
directory.Edit: The whole passing of information through constant variables cannot be done, see the note here:
That means we cannot access the constant value even if we know the name.
Maybe an easy workaround is to use type-level string literals and store the information in the types instead of in the RHS.