CakeML / pure

A verified compiler for a lazy functional language
Other
32 stars 4 forks source link

Lightweight modules (namespaces) #30

Open hrutvik opened 1 year ago

hrutvik commented 1 year ago

Lack of a module system results in significantly duplicated code between PureCake programs. A lightweight module system will help this considerably. This issue gives a high level overview of the key features we might want, which are essentially derived from GHC.

Basic functionality

The basic usage should be familiar to GHC users. A module Foo can be defined by adding a first line as follows:

module Foo where

<definitions of Foo>

Unlike GHC, we probably don't care about file name and location (i.e. where GHC expects module A.B.C to be in A/B/C.hs). However, we should still permit modules names with this nested structure ("hierarchical module names"), to permit users to organise modules (e.g. Data.List, Data.Set, ...).

This can then be used in another module Bar as follows:

module Bar where

import Foo

<code using Foo's definitions>

All of Foos definitions should be considered available without qualification, i.e. all of its bindings are added to the current namespace. For example, if Foo defines mydef, then mydef is directly usable after import Foo.

Implementation

There are many ways we might consider implementing this.

  1. Elaborate names just after parsing without significant verification.
  2. Enrich PureLang with "long" names, verifiably flattening them before translation into ThunkLang.
  3. Enrich all languages with "long" names, mapping them to CakeML's long names.
  4. ...

At first, we can focus on the first/simplest. During parsing, we can elaborate any mydef introduced by import Foo to Foo.mydef. If there is any ambiguity (i.e. there are multiple possible modules which define mydef) then we should present the user with an error. We need to be careful about the renaming: if Foo.mydef is a variable name already in existence, we could change semantics here. The simplest approach here is to forbid full stops (".") in user-supplied names. However, we want to permit Haskell's infix composition operator (f . g) - so we can permit full stops in infixes, which should be composed of special characters (and thus are disjoint from the alphanumeric names of modules). There is an edge case here with how to elaborate infixes. We could depart slightly from GHC here, elaborating an infix <*> defined in Foo as "Foo.(<*>)" (users write "(Foo.<*>)" in GHC).

When compiling modules written over multiple files, the simplest thing to do is concatenate them together before passing to the PureCake compiler. Parsing and elaboration must therefore be able to handle several modules in the same file:

module Foo where
...

module Bar where
...

A side-effect of this approach is that PureCake will allow users to write multiple modules in a single file, unlike GHC which expects one module per file.

We should be careful about what we expect the entrypoint of a program to look like. We could require a special module with name Main, also defining main. Alternatively, we could require the entrypoint not to be a module at all, i.e. just code with no top-level module declaration. To do this we need a way of differentiating entrypoint code from module code once both are concatenated together as above.

Additional functionality

Some extensions are listed below, in decreasing order of importance. The Haskell Wiki provides further inspiration.

We will use the following running example:

module MyList where

data List a = Nil | Cons a (List a)

map = ...
reverseHelper = ...
reverse = ...
foldlHelper = ...
foldl = ...

Hiding

Modules should be able to hide definitions from code that imports them. For example, Foo can hide helper functions by changing its module declaration, explicitly stating the definitions it wants to expose:

module Foo (List(Nil, Cons), map, reverse, foldl) where
...

The compiled code will of course include all definitions, but elaboration will first check that all referenced names are in scope.

Note that to expose types, GHC uses a syntax of the form TypeName(Constructor1, ..., ConstructorN), as shown for List above. We should adopt a similar syntax, though type names are not currently used by PureCake. Elaboration will still check that constructor names are in scope when used. Infix names should also be handled.

Selective imports

When importing a module, we may only want certain definitions for simplicity or to avoid name clashes. For example we can import only reverse and foldl:

import Foo (reverse, foldl)
...

Or we can hide map and reverse, importing everything else:

import Foo hiding (map, reverse)
...

As above, types and infixes must be considered.

Qualified names

If modules Foo1 and Foo2 both define map, we need a way of disambiguating them. Qualified names allow this:

import Foo1
import Foo2

map1 = Foo1.map
map2 = Foo2.map

Without the Foo1. and Foo2. prefixes, we would be forced to return an error as we don't know which map is being referred to. With qualified names, user-defined names can once again contain full stops. Elaboration can assume any such name is in fact a qualified name, and check for an appropriate module containing the correct definition. As above, types and infixes must be considered.

Renaming imports

With qualified names, we can also choose to rename modules on import. This ensures their namespace does not pollute our own, but allows access to definitions via qualified names.

import qualified Foo as MyList

mymap = MyList.map

Without the prefix MyList., this code would cause an elaboration error as we have kept Foos definitions behind the MyList name.

Renaming and selective imports

Renaming and selective importing can interact. We can rename a module then selectively bring some of its definitions into scope. For example:

import qualified Foo as MyList
import Foo (MyList, map, reverse)

All of Foos definitions are available to us behind qualified names (MyList.<definition>), but map and reverse are available without any qualified names. As above, types and infixes must be considered.