Lack of a module system results in significantly duplicated code between PureCake programs. A lightweight module system will help this considerably. This issue gives a high level overview of the key features we might want, which are essentially derived from GHC.
Basic functionality
The basic usage should be familiar to GHC users. A module Foo can be defined by adding a first line as follows:
module Foo where
<definitions of Foo>
Unlike GHC, we probably don't care about file name and location (i.e. where GHC expects module A.B.C to be in A/B/C.hs). However, we should still permit modules names with this nested structure ("hierarchical module names"), to permit users to organise modules (e.g. Data.List, Data.Set, ...).
This can then be used in another module Bar as follows:
module Bar where
import Foo
<code using Foo's definitions>
All of Foos definitions should be considered available without qualification, i.e. all of its bindings are added to the current namespace. For example, if Foo defines mydef, then mydef is directly usable after import Foo.
Implementation
There are many ways we might consider implementing this.
Elaborate names just after parsing without significant verification.
Enrich PureLang with "long" names, verifiably flattening them before translation into ThunkLang.
Enrich all languages with "long" names, mapping them to CakeML's long names.
...
At first, we can focus on the first/simplest. During parsing, we can elaborate any mydef introduced by import Foo to Foo.mydef.
If there is any ambiguity (i.e. there are multiple possible modules which define mydef) then we should present the user with an error. We need to be careful about the renaming: if Foo.mydef is a variable name already in existence, we could change semantics here. The simplest approach here is to forbid full stops (".") in user-supplied names. However, we want to permit Haskell's infix composition operator (f . g) - so we can permit full stops in infixes, which should be composed of special characters (and thus are disjoint from the alphanumeric names of modules).
There is an edge case here with how to elaborate infixes. We could depart slightly from GHC here, elaborating an infix <*> defined in Foo as "Foo.(<*>)" (users write "(Foo.<*>)" in GHC).
When compiling modules written over multiple files, the simplest thing to do is concatenate them together before passing to the PureCake compiler. Parsing and elaboration must therefore be able to handle several modules in the same file:
module Foo where
...
module Bar where
...
A side-effect of this approach is that PureCake will allow users to write multiple modules in a single file, unlike GHC which expects one module per file.
We should be careful about what we expect the entrypoint of a program to look like. We could require a special module with name Main, also defining main. Alternatively, we could require the entrypoint not to be a module at all, i.e. just code with no top-level module declaration. To do this we need a way of differentiating entrypoint code from module code once both are concatenated together as above.
Additional functionality
Some extensions are listed below, in decreasing order of importance.
The Haskell Wiki provides further inspiration.
We will use the following running example:
module MyList where
data List a = Nil | Cons a (List a)
map = ...
reverseHelper = ...
reverse = ...
foldlHelper = ...
foldl = ...
Hiding
Modules should be able to hide definitions from code that imports them. For example, Foo can hide helper functions by changing its module declaration, explicitly stating the definitions it wants to expose:
module Foo (List(Nil, Cons), map, reverse, foldl) where
...
The compiled code will of course include all definitions, but elaboration will first check that all referenced names are in scope.
Note that to expose types, GHC uses a syntax of the form TypeName(Constructor1, ..., ConstructorN), as shown for List above. We should adopt a similar syntax, though type names are not currently used by PureCake. Elaboration will still check that constructor names are in scope when used. Infix names should also be handled.
Selective imports
When importing a module, we may only want certain definitions for simplicity or to avoid name clashes. For example we can import only reverse and foldl:
import Foo (reverse, foldl)
...
Or we can hide map and reverse, importing everything else:
import Foo hiding (map, reverse)
...
As above, types and infixes must be considered.
Qualified names
If modules Foo1 and Foo2 both define map, we need a way of disambiguating them. Qualified names allow this:
Without the Foo1. and Foo2. prefixes, we would be forced to return an error as we don't know which map is being referred to. With qualified names, user-defined names can once again contain full stops. Elaboration can assume any such name is in fact a qualified name, and check for an appropriate module containing the correct definition. As above, types and infixes must be considered.
Renaming imports
With qualified names, we can also choose to rename modules on import. This ensures their namespace does not pollute our own, but allows access to definitions via qualified names.
import qualified Foo as MyList
mymap = MyList.map
Without the prefix MyList., this code would cause an elaboration error as we have kept Foos definitions behind the MyList name.
Renaming and selective imports
Renaming and selective importing can interact. We can rename a module then selectively bring some of its definitions into scope. For example:
import qualified Foo as MyList
import Foo (MyList, map, reverse)
All of Foos definitions are available to us behind qualified names (MyList.<definition>), but map and reverse are available without any qualified names. As above, types and infixes must be considered.
Lack of a module system results in significantly duplicated code between PureCake programs. A lightweight module system will help this considerably. This issue gives a high level overview of the key features we might want, which are essentially derived from GHC.
Basic functionality
The basic usage should be familiar to GHC users. A module
Foo
can be defined by adding a first line as follows:Unlike GHC, we probably don't care about file name and location (i.e. where GHC expects module
A.B.C
to be inA/B/C.hs
). However, we should still permit modules names with this nested structure ("hierarchical module names"), to permit users to organise modules (e.g.Data.List
,Data.Set
, ...).This can then be used in another module
Bar
as follows:All of
Foo
s definitions should be considered available without qualification, i.e. all of its bindings are added to the current namespace. For example, ifFoo
definesmydef
, thenmydef
is directly usable afterimport Foo
.Implementation
There are many ways we might consider implementing this.
At first, we can focus on the first/simplest. During parsing, we can elaborate any
mydef
introduced byimport Foo
toFoo.mydef
. If there is any ambiguity (i.e. there are multiple possible modules which definemydef
) then we should present the user with an error. We need to be careful about the renaming: ifFoo.mydef
is a variable name already in existence, we could change semantics here. The simplest approach here is to forbid full stops (".") in user-supplied names. However, we want to permit Haskell's infix composition operator (f . g
) - so we can permit full stops in infixes, which should be composed of special characters (and thus are disjoint from the alphanumeric names of modules). There is an edge case here with how to elaborate infixes. We could depart slightly from GHC here, elaborating an infix<*>
defined inFoo
as "Foo.(<*>)
" (users write "(Foo.<*>)
" in GHC).When compiling modules written over multiple files, the simplest thing to do is concatenate them together before passing to the PureCake compiler. Parsing and elaboration must therefore be able to handle several modules in the same file:
A side-effect of this approach is that PureCake will allow users to write multiple modules in a single file, unlike GHC which expects one module per file.
We should be careful about what we expect the entrypoint of a program to look like. We could require a special module with name
Main
, also definingmain
. Alternatively, we could require the entrypoint not to be a module at all, i.e. just code with no top-levelmodule
declaration. To do this we need a way of differentiating entrypoint code from module code once both are concatenated together as above.Additional functionality
Some extensions are listed below, in decreasing order of importance. The Haskell Wiki provides further inspiration.
We will use the following running example:
Hiding
Modules should be able to hide definitions from code that imports them. For example,
Foo
can hide helper functions by changing itsmodule
declaration, explicitly stating the definitions it wants to expose:The compiled code will of course include all definitions, but elaboration will first check that all referenced names are in scope.
Note that to expose types, GHC uses a syntax of the form
TypeName(Constructor1, ..., ConstructorN)
, as shown forList
above. We should adopt a similar syntax, though type names are not currently used by PureCake. Elaboration will still check that constructor names are in scope when used. Infix names should also be handled.Selective imports
When importing a module, we may only want certain definitions for simplicity or to avoid name clashes. For example we can import only
reverse
andfoldl
:Or we can hide
map
andreverse
, importing everything else:As above, types and infixes must be considered.
Qualified names
If modules
Foo1
andFoo2
both definemap
, we need a way of disambiguating them. Qualified names allow this:Without the
Foo1.
andFoo2.
prefixes, we would be forced to return an error as we don't know whichmap
is being referred to. With qualified names, user-defined names can once again contain full stops. Elaboration can assume any such name is in fact a qualified name, and check for an appropriate module containing the correct definition. As above, types and infixes must be considered.Renaming imports
With qualified names, we can also choose to rename modules on import. This ensures their namespace does not pollute our own, but allows access to definitions via qualified names.
Without the prefix
MyList.
, this code would cause an elaboration error as we have keptFoo
s definitions behind theMyList
name.Renaming and selective imports
Renaming and selective importing can interact. We can rename a module then selectively bring some of its definitions into scope. For example:
All of
Foo
s definitions are available to us behind qualified names (MyList.<definition>
), butmap
andreverse
are available without any qualified names. As above, types and infixes must be considered.