UU-ComputerScience / uuagc

3 stars 9 forks source link

Template Haskell #10

Open noughtmare opened 3 years ago

noughtmare commented 3 years ago

I have tought a bit about using template haskell with uuagc.

Initial working version

First we could use a file quasi quoter (made with the quoteFile function) to load and compile an ag file.

Options can be passed to this quasi quoter by defining a specific variable (say agOptions) as, for example, a certain string constant. This variable can be looked up by the quasi quoter and the string constant can be read.

Edit: an annoyance is that due to the stage restriction you cannot lookup variables that are declared in the module itself, so you need to put this configuration variable in a separate module.

Edit2: Maybe we can avoid the stage restriction by using addModFinalizer in combination with addTopDecls. I wonder if those work together.

Edit3: No, lookupValueName works in addModFinalizer, but reify doesn't. Instead maybe we could just read a file with an IO action, maybe even the module itself (we can get the file name with loc_filename <$> location) and store the options in a comment?

Edit4: We can work around the whole problem by using plain TH functions instead of quasi quoters:

-- TH.hs
module TH where

import Language.Haskell.TH

useOption :: String -> Q [Dec]
useOption x = return [ValD (VarP (mkName "result")) (NormalB (LitE (StringL x))) []]
-- Main.hs
{-# LANGUAGE TemplateHaskell #-}

import TH

useOption "semfuns"

main :: IO ()
main = print result

What is also required is a way to convert an execution plan to template haskell. It should be pretty straightforward, copy over ExecutionPlan2Hs.ag to ExecutionPlan2TH.ag and rewrite all the functions to generate template haskell declarations.

Edit: I just noticed that there is ExecutionPlan2Hs.ag and PrintCode.ag which both function as the final step to Haskell code. The execution plan is used when loag or kennedy warren is enabled, and the code is used otherwise. I started with rewriting the execution plan printer, but now I realize I should probably start with the code printer.

A difficult part will probably be the conversion of embedded haskell snippets to template haskell data types. The haskell-src-meta package is probably our best bet.

The suggested plan up to this point seems to me like it should be enough to build single file grammars, but I don't think it will be able to handle grammars that include other grammars.

A more modular solution

To go even further we could investigate using the Haskell module system instead of the ad-hoc #include system to make the attribute grammars more modular (and possibly allow importing attribute grammars across Haskell packages.

The idea is to save generated attribute grammar information in special deterministically named variables that are unique for each module and uuagc version perhaps. Any quasi quoter can lookup this variable which could simply store all necessary information in one big string literal, but perhaps we can do better.

To get such unique names we could use a hashing function of the module name and uuagc name and version. That can then be base64 encoded with the characters A-Z, a-z, 0-9, and _ and ' which are legal characters in variable names. Using a good hash function should ensure that no person can accidentally use the same variable name while still ensuring that this name can be generated by quasi quoters in other modules.

Question: how to make sure that that unique variable is always exported? Answer: We could make a special template haskell splice that takes a list of imports as input and adds the special variable to that list. It is not possible to generate an export list (or even an import list) with Template Haskell.

This is actually a pretty important point. We can just write these imports in the Haskell module that contains the template haskell expression to generate the attribute grammar code. But if we want to import semantic functions from other modules then we need to make sure that all the Haskell code in those semantic modules has the right imports. A solution might be to bind each Haskell code block to a Haskell variable which uses imports from the module that it is defined in. Those variables can then be exported and imported in other modules and the names can be transferred via the type literal trick as described in the comment below. I think we can require users to use implicit exports so that everything is automatically exported.

Extras

If this system if successfully implemented then it might also make sense to implement template haskell functions that can automatically derive attribute grammar nonterminals from existing data types. That would make it much easier to integrate attribute grammars into existing projects.

Update: I just found out that this would be similar to the Haskell2AG tool that is in the tools directory.


Edit: The whole passing of information through constant variables cannot be done, see the note here:

The Maybe Dec field contains Just the declaration which defined the variable - including the RHS of the declaration - or else Nothing, in the case where the RHS is unavailable to the compiler. At present, this value is always Nothing: returning the RHS has not yet been implemented because of lack of interest.

That means we cannot access the constant value even if we know the name.

Maybe an easy workaround is to use type-level string literals and store the information in the types instead of in the RHS.

noughtmare commented 3 years ago

Proof of concept passing information from one module to another:

-- QQ.hs
module QQ where

import Language.Haskell.TH
import Language.Haskell.TH.Quote
import Language.Haskell.TH.Syntax

qqtest = QuasiQuoter
  { quoteExp = compile
  , quotePat = notHandled "patterns"
  , quoteType = notHandled "types"
  , quoteDec = notHandled "declarations"
  }
  where notHandled things = error $ things ++ " are not handled by the qqtest quasiquoter."

compile :: String -> Q Exp
compile str = do
  Just x <- lookupValueName "test"
  x' <- reify x
  case x' of
    VarI n (AppT _ (LitT (NumTyLit x))) _ -> return (LitE (IntegerL x))
-- Source.hs
{-# LANGUAGE DataKinds #-}
module Source where

import GHC.TypeLits
import Data.Proxy

-- This type can contain arbitrary data and it could have been generated by a quasi quoter.
test :: Proxy 1337
test = Proxy
-- Target.hs
{-# LANGUAGE QuasiQuotes #-}

import QQ
import Source

main :: IO ()
main = print [qqtest|abc|]

Running it will print the expected: 1337.

noughtmare commented 3 years ago

I have been able to make a prototype: https://github.com/noughtmare/uuagc/tree/template-haskell

You can try it out by cloning and compiling this test project: https://github.com/noughtmare/test-uuagc-th

It contains the code:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
module Main where

import UU.UUAGC (uuagcTH)
import Text.RawString.QQ

uuagcTH ["haskellsyntax","data"] [r|
data Tree
  | Node left  :: Tree
         right :: Tree
  | Tip  value :: Int

deriving Tree : Show
|]

main :: IO ()
main = print (Node (Tip 1) (Node (Tip 2) (Tip 3)))

If everything goes well, that will compile and produce the string:

Node (Tip 1) (Node (Tip 2) (Tip 3))

It is currently limited to only generate data types.

noughtmare commented 3 years ago

I have noticed that I was working on the wrong file. ExecutionPlan2Hs is only used when loag or kennedy warren is enabled. Instead it is much easier to start by rewriting the PrintCode module. I have started doing that and now I already have a version that can compile a small attribute grammar including data, catas, semfuns and wrappers (https://github.com/noughtmare/uuagc/commit/0937ffb084e4e8cf51894a36614dc4eeb7b9785d). I have also updated the test-uuagc-th repo so that it can now compile and run this more complete example:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where

import UU.UUAGC (uuagcTH)
import Text.RawString.QQ

uuagcTH ["haskellsyntax","data","signatures","catas","wrappers","semfuns"] [r|
data Tree
  | Node left  :: Tree
         right :: Tree
  | Tip  value :: Int

attr Tree
  syn max :: {Int}

sem Tree
  | Node lhs.max = max @left.max @right.max
  | Tip  lhs.max = @value

deriving Tree : Show
|]

main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
noughtmare commented 3 years ago

I have now made a bigger proof of concept for sharing information across modules with template Haskell: https://github.com/noughtmare/splitfuns.

noughtmare commented 3 years ago

I have now been able to implement a similar modularity system with my uuagc TH fork, here is an example of it in action: https://github.com/noughtmare/test-uuagc-th

There is one file Data.hs which contains the data definitions:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Data where

import UU.UUAGC (uuagcTH)
import Text.RawString.QQ

uuagcTH "dat" ["haskellsyntax","data"] [] [r|
data Tree
  | Node left  :: Tree
         right :: Tree
  | Tip  value :: Int

deriving Tree : Show
|]

And the Main.hs file imports that and defines semantics for it:

{-# LANGUAGE TemplateHaskell, QuasiQuotes #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where

import UU.UUAGC (uuagcTH)
import Text.RawString.QQ
import Data

uuagcTH "sem" ["haskellsyntax","signatures","catas","wrappers","semfuns"] [dat] [r|
attr Tree
  syn max :: {Int}

sem Tree
  | Node lhs.max = max @left.max @right.max
  | Tip  lhs.max = @value
|]

main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree
jbransen commented 3 years ago

Awesome! I'll have to find some time some day to dig into this, but this looks like a good start. I think the whole Template Haskell approach has a few challenges:

noughtmare commented 3 years ago

Sharing code generation

Concretely, I think this would mostly mean replacing the use of the data structures in Code.ag with TH data types, or do you mean something else?

Edit: I see now that also things like HsToken could be replaced by Template Haskell throughout the whole compilation process.

I don't think it is possible to fully merge the compilation paths, because things like includes, import blocks and sepsemmods are just so different or unnecessary when using template Haskell.

I think the Clean and OCaml code generation would also be a problem.

Quoting / parsing of expressions

For this I have used the haskell-src-meta package. It uses haskell-src-exts to parse the text and then converts the results to TH types. But it doesn't support everything yet and I think haskell-src-exts is also not always exactly the same as GHC. If we want full support then we need to use ghc-lib-parser and/or convince/help GHC developers to add something like this nested bracket merge request. Because I think antiquotation is not currently supported.

Update: I have now also made a package that uses GHC's own parser to parse Haskell code into Template Haskell types, see: https://github.com/UU-ComputerScience/uuagc/issues/10#issuecomment-822821613. And yet another way is to use the more granular embedding, see: https://github.com/UU-ComputerScience/uuagc/issues/10#issuecomment-823334968.

Composability and sharing the definitions between modules

I must say that it is a bit fragile with respect to the capturing of variable names. Right now it doesn't capture anything, so you'd have to import all the required imports in the module where you generate the final code instead of the module where you use the variables. We can piggyback on GHC by using a more granular syntactic embedding, this is my artist's rendition of what that could look like:

{-# LANGUAGE TemplateHaskell, QuasiQuotes, QualifiedDo, BlockArguments, DuplicateRecordFields, NoFieldSelectors #-}
{-# OPTIONS_GHC -ddump-splices #-}
module Main where

import UU.UUAGC

data Tree
  = Node { left  :: Tree
         , right :: Tree
         }
  | Tip  { value :: Int }
  deriving Show

-- register the Tree type with the system
data' ''Tree

-- register the max attribute
-- the namespace of these attributes should actually be completely separate from everything else.
attr ''Tree Attr.do
  syn "max" [t| Int |]

-- define the semantic rules that calculate the max attribute
sem ''Tree Sem.do 
  -- I'm not very happy yet about how these splices look.
  case' 'Node (lhs % "max" .= [| max $('left % "max") $('right % "max") |])
  case' 'Tip  (lhs % "max" .= val 'value)

main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree

But even then there are bugs like this: https://gitlab.haskell.org/ghc/ghc/-/issues/19662. And I think it really adds too much clutter to the syntax.

maybe naming the module or using fqns would be better

I am now thinking that I could just always use the name "ag" and then users can import the whole module qualified, so you would get something like Data.ag.

I must say that it is much easier to use a Haskell variable to store the module in, because while you can reify strings to variables, you cannot get the value of that variable. You can still get the type, so a workaround is to serialize the data into a type like I do in the auto branch of the splitfuns prototype: here. There I store the data in a type variable name (which is pretty crazy), a slightly better way would perhaps be to use a type level integer, but that requires the DataKinds extension and is not that much better.

noughtmare commented 3 years ago

I just did some testing with using the ghc (or ghc-lib-parser) package for parsing strings into Template Haskell types, here's the result: ghc-meta. It is doable, but doing everything correctly and keeping up with GHC releases might become a problem.

noughtmare commented 3 years ago

I actually think a more granular syntax might be acceptable, I've come up with this:

{-# LANGUAGE TemplateHaskell, QuasiQuotes, DuplicateRecordFields #-}
module Main where

import UU.UUAGC

data Tree
  = Node { left  :: Tree
         , right :: Tree
         }
  | Tip  { value :: Int }
  deriving Show

attr ''Tree
  [ syn "max" [t|Int|]
  ]

sem ''Tree
  [ alt 'Node [d|[a|lhs.max|] = max [a|left.max|] [a|right.max|]|]
  , alt 'Tip  [d|[a|lhs.max|] = [a|value|]|]
  ]

main :: IO ()
main = print $ max_Syn_Tree $ wrap_Tree (sem_Tree (Node (Tip 1) (Node (Tip 2) (Tip 3)))) Inh_Tree

The lists can perhaps be made slightly more nice with QualifiedDo or RebindableSyntax as in my previous artist's rendition, but I think this is already relatively nice.

The huge advantage of this is that we don't need haskell-src-exts or ghc-lib-parser, because we can just use the built-in Haskell quoters.

How do you feel about that?

Edit: If we use ' instead of . to separate field from attribute name and if we do some name resolution magic, then we can probably get rid of the a quasiquoter:

sem ''Tree
  [ alt 'Node [d| lhs'max = max left'max right'max |]
  , alt 'Tip  [d| lhs'max = value |]
  ]

Then we can do some post processing to rename the variables appropriately. Of course the disadvantage is that you can no longer use normal Haskell variables/functions with an apostrophe, but we could just use a double apostrophe as escape sequence.

Update: the biggest problem I see now is that there isn't a nice syntax for extending data types with additional cases using this more granular approach.

jbransen commented 3 years ago

Sorry for the slow replies, I wanted to spend some time on this but it does not work out.

Regarding the nice syntax: I think that is a different project. My main motivation for having some TH support is keeping uuagc alive. The cabal-plugin is harder and harder to deal with, but there are some projects that depend on it, so it would be great if there is a small manual refactor so that TH can be used with uuagc, which seems doable.

noughtmare commented 3 years ago

a small manual refactor so that TH can be used with uuagc, which seems doable

It is doable, but some features like OCaml and Clean code generation (obviously TH cannot be used in OCaml or Clean) and sepsemmods (TH cannot create modules or even imports and exports) cannot be used with template haskell. So, I don't know if it will ever be a full replacement for the cabal plugin. However, I do have hopes that TH will allow us to publish attribute grammars to Hackage which seems like a big win (although it will not really be modular like AspectAG), because all the grammar pieces will still need to be collected and compiled together.

noughtmare commented 3 years ago

Another alternative is to follow the approach of https://github.com/ElderEphemera/instance-impl which is a GHC plugin that desugers its nonsensical syntax to produce a valid program. The only requirement then is that the program must be accepted by GHC's parser. This requirement is not so strict, e.g. this parses:

data Tree
  = Nil
  | Node (l :: Tree) (x :: Int) (r :: Tree)

attr Tree
  syn use (+) (const 0) (sum :: Int)

sem Tree
  | Nil lhs.sum = 0
  | Node +sum = (+ x)

The data declaration is parsed as a normal data declaration with type variables and kind signatures. The attr declaration is parsed as a top-level TH splice where the attr function is applied to many arguments. The sem declaration is parsed as a function definition with guards. It might be possible to desugar all of these to our desired output.