gelisam / klister

an implementation of stuck macros
BSD 3-Clause "New" or "Revised" License
128 stars 11 forks source link

testsuite: test all examples, use tasty-golden #210

Closed doyougnu closed 1 year ago

doyougnu commented 1 year ago

Closes out #42. This MR also:

doyougnu commented 1 year ago

Ah! of course the paths would be different on CI.

@gelisam is there a concept of $KLISTERROOT? so we could change the paths to $KLISTERROOT/examples/.... We could also add post-hooks to the testsuite and mark these tests as skip, or fragile or expect to fail, or something like that.

gelisam commented 1 year ago

is there a concept of $KLISTERROOT?

Kind of, there is a $KLISTERPATH which is usually set to the examples folder so that files outside of the examples folder can access the libraries from the examples folder:

$ pwd
/home/gelisam/working/haskell/klister

$ cabal run klister -- run toy.kl 
Module not found: define-syntax-rule.kl
Searched in directory at /home/gelisam/working/haskell/klister
And on $KLISTERPATH:

ExitFailure 1

$ export KLISTERPATH=`pwd`/examples
$ cabal run klister -- run toy.kl 
Example at toy.kl:9.1-335.1:
  (λx. let x = #[toy.kl:6.20-6.25]<other> in x 5) : Integer ↦
  5

But in principle KLISTERPATH could just as well be set to any other path if the user has their own folder of libraries, or to a list of colon-separated paths.

gelisam commented 1 year ago
Circular imports while importing
  "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"

Ah! of course the paths would be different on CI.

We had this problem before, and we solved it by changing the implementation so it only prints the basename of the file instead of the full path.

doyougnu commented 1 year ago

We had this problem before, and we solved it by changing the implementation so it only prints the basename of the file instead of the full path.

hmm this does not seem like a clearer approach:

    circular-1:                                        FAIL
      Test output was different from 'examples/non-examples/circular-1.golden'. Output of ["diff","-u","examples/non-examples/circular-1.golden","/tmp/circular-11576145-55.actual"]:
      --- examples/non-examples/circular-1.golden   2023-03-02 21:36:24.865466787 -0500
      +++ /tmp/circular-11576145-55.actual  2023-03-07 21:16:27.803397769 -0500
      @@ -1,5 +1,2 @@
      -Circular imports while importing
      -  "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"
      -  Context:
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-2.kl"
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"
      +Circular imports while importing "circular-1.kl"
      +  Context: "circular-2.kl" "circular-1.kl"

The entire pathname is pretty useful. Imagine if we had examples/A/foo.kl and examples/B/foo.kl, then just the filename will produce the error Circular imports ... Context: "foo.kl" "foo.kl". I think the real problem is that modules are tied to the filesystem at all. In contrast common lisp does not tie these two concepts together. Instead it treats a module as a first class concept and allows one to create a module out of a single file or across several files, or have a single file which defines several modules (although i'm not sure that's a good idea). In any case, I like the idea of not tying modules to files, this would allow klister to avoid the module re-exports that Haskell has. BTW what is the module story for klister? I assumed they are tied to files because of:

data ModuleName = ModuleName FilePath | KernelName KernelName
  deriving (Data, Eq, Ord, Show)
doyougnu commented 1 year ago

We had this problem before, and we solved it by changing the implementation so it only prints the basename of the file instead of the full path.

hmm this does not seem like a clearer approach:

    circular-1:                                        FAIL
      Test output was different from 'examples/non-examples/circular-1.golden'. Output of ["diff","-u","examples/non-examples/circular-1.golden","/tmp/circular-11576145-55.actual"]:
      --- examples/non-examples/circular-1.golden 2023-03-02 21:36:24.865466787 -0500
      +++ /tmp/circular-11576145-55.actual    2023-03-07 21:16:27.803397769 -0500
      @@ -1,5 +1,2 @@
      -Circular imports while importing
      -  "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"
      -  Context:
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-2.kl"
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"
      +Circular imports while importing "circular-1.kl"
      +  Context: "circular-2.kl" "circular-1.kl"

The entire pathname is pretty useful. Imagine if we had examples/A/foo.kl and examples/B/foo.kl, then just the filename will produce the error Circular imports ... Context: "foo.kl" "foo.kl". I think the real problem is that modules are tied to the filesystem at all. In contrast common lisp does not tie these two concepts together. Instead it treats a module as a first class concept and allows one to create a module out of a single file or across several files, or have a single file which defines several modules (although i'm not sure that's a good idea). In any case, I like the idea of not tying modules to files, this would allow klister to avoid the module re-exports that Haskell has. BTW what is the module story for klister? I assumed they are tied to files because of:

data ModuleName = ModuleName FilePath | KernelName KernelName
  deriving (Data, Eq, Ord, Show)

Ah we probably inherited this from racket: https://docs.racket-lang.org/guide/module-basics.html Chez also doesn't associate modules to files: https://cisco.github.io/ChezScheme/csug9.5/syntax.html#./syntax:h0

gelisam commented 1 year ago

Imagine if we had examples/A/foo.kl and examples/B/foo.kl

Don't worry, that can't happen! #126 " 🙃

Seriously though, printing the basename instead of the full path in order to avoid CI issues is obviously a hack. I think hacks are still fine at this stage of the project, but if you want a proper solution, I can think of a few:

  1. Keep the output as-is, and tweak the golden test implementation to accept outputs which only differ in the path to the examples folder. For example, in typelevel-rewrite-rules I want to use golden tests to check that the error messages printed by GHC are readable, but I have no control over whether GHC prints full paths or not, so I tweak the output before comparing it to the golden file.
  2. Instead of printing absolute paths, print relative paths, either relative to $PWD or to the file given to klister run.
  3. Instead of printing the full path, only print the part which differs. So circular-1.kl and circular-2.kl, without the shared prefix /.../non-examples, and A/foo.kl and B/K.kl, without the shared prefix /.../examples but with the important A and B path fragments.
gelisam commented 1 year ago

Ah we probably inherited this from racket: https://docs.racket-lang.org/guide/module-basics.html

Racket recommends using one file per module, but it supports defining multiple modules in a single file. I don't know if Racket supports spreading a single module over multiple files.

gelisam commented 1 year ago

BTW what is the module story for klister? I assumed they are tied to files because of: [...]

Yes, modules are currently tied to files. I don't think it would be too hard to change that if we wanted, but looking at the advantages vs the disadvantages, I am not yet convinced it's worth it.

advantages:

  1. It would fix our issue with absolute paths and CI, because the error messages would print the names of the modules, not their paths. But note that Haskell, which does tie modules to files, also prints the module name (separated with dots), not the path (separated with slashes), so there are other ways to solve the issue (including the 3 other solutions I have suggested above).
  2. It would help with the Klister Guide, as it would be possible to put both the module which defines a #lang and the module which uses this #lang in the same file. Currently, I have to either put the module which uses the #lang in a comment, or to tell the reader to open a separate file.
  3. You claim that it helps with avoiding modules which merely re-export other modules, but I don't see how. If I define N related modules in a single file, won't I still want to define an N+1st module which re-exports all N modules, so that users only have to import a single module? Users would import a module name, not a path, right? Or is this for the case in which we want to define a very large module with a lot of contents, so we want to divide that contents into a number of files, so in Haskell we would be forced to define several smaller .Internal.* modules and to use re-exports to expose everything as a single module, whereas in Chez several files can contribute to the same module?

disadvantages:

  1. If a module named foo can be defined in any file, not just in a file named foo.kl, then only printing the module name and not the path might make it difficult for the user to find the source code for that module.
  2. If users write (import 'foo), then the compiler will similarly have a hard time finding the source code for foo. I guess the compiler already needs to have previously compiled the file which provides the foo module, so it can look up foo in some database of modules? That would require implementing #118 first, I guess.
  3. As a consequence of (2), it is not possible for the compiler to lazily load the modules as they are encountered, instead it is now the responsibility of the user to first compile all the files which could possibly affect the main file, and then load the main file. This can lead to confusing situations in which the user makes a change to the file foo.kl which defines module foo, then reloads the file bar.kl which imports module foo, and then gets confused because their changes are ignored.
  4. How would scoping work? In Klister, the definitions are ordered from top to bottom inside the file. If different files are part of the same module, are they supposed to see each other's definitions? To avoid cycles, if foo1.kl and foo2.kl both contribute to the foo module, foo2.kl should be able to see the foo1.kl definitions but not the other way around (or vice-versa). But how is that specified? foo2.kl would have to import foo1.kl, no?
gelisam commented 1 year ago

One alternate design for modules I would be more excited about would be sub-modules, in the style of Agda. That is, a file foo.kl can define top-level definitions add and multiply, and also some top-level submodules moduleA and moduleB providing addA and addB respectively. Then, the user can import foo.kl and then selectively import the sub-modules they need:

(import "fool.kl")
(import moduleB)

(example (add 43 (addB 2 2)))
doyougnu commented 1 year ago

One alternate design for modules I would be more excited about would be sub-modules, in the style of Agda. That is, a file foo.kl can define top-level definitions add and multiply, and also some top-level submodules moduleA and moduleB providing addA and addB respectively. Then, the user can import foo.kl and then selectively import the sub-modules they need:

(import "fool.kl")
(import moduleB)

(example (add 43 (addB 2 2)))

Yes I had something similar in mind, but in agda don't you have to explicitly open the module in order to access addA from module A? For now I think the hacks are entirely fine. Discussing the module system should probably belong to #211 anyhow.

gelisam commented 1 year ago

in agda don't you have to explicitly open the module in order to access addA from module A?

Yes, in Agda you can choose whether or not to open modules in order to decide whether to write names qualified or unqualified. So you could write any of the following:

import Foo
import Foo.moduleB

x : _
x = Foo.add 43 (Foo.moduleA.addB 2 2)
open import Foo
import moduleB

x : _
x = add 43 (moduleA.addB 2 2)
open import Foo
open import moduleB

x : _
x = add 43 (addB 2 2)

But Klister does not support qualified names (maybe we should?), so it does not currently make sense to distinguish between opened and unopened modules, any imported module is automatically opened.

Discussing the module system should probably belong to #211 anyhow.

Better yet, a dedicated issue for modules, so we can use #211 to discuss whether fancier modules should be on the roadmap rather than what our fancier modules should look like.

david-christiansen commented 1 year ago

If working on Klister's modules, make sure to read "Composable and Compilable Macros: You Want it When?" by Matthew Flatt. Whatever we do should address the concerns in that paper, and importing a design from Agda won't.

In particular, a system with serious macros should have the following in its module system:

  1. Phase-segregated instantiations of modules - if module A is used at compile time and at run time, the values for each phase should be distinct. This allows you do to things like have a file handle as a value. Keeping in mind that code may be run on a different computer than it is built on, file handles on the build host are almost certainly not valid on the runtime machine.
  2. Separate compilation should be possible - if module B uses module A at run time and at compile time, module A should be able to be compiled ahead of time and then have the same object code used for both phases without rebuilding.
  3. It should be semantically reasonable to strip all compile-time dependencies from object code. Let's say we have a fancy Z3-assisted code generation step in a macro - it would be great to run the object code on a machine where the solver isn't present, and to not have to ship a heavyweight codegen dependency with the object code.
  4. It should be possible for macros to associate compile-time information with names. Klister doens't have this right now, but I think that some kind of system inspired by LVars and MetaPRL's tables could do the trick. This allows one to do things like associate type class contexts with method names.

Right now, we have most of the basics of this kind of module system, but it's missing important parts. We have phase-segregated instantiation, our semantics support separate compilation, and we can strip compile-time dependencies if we want. We presently tie modules to files (with #lang), but it would be good to allow submodules like Racket does. In Klister today, #lang is hard-coded in the parser. In Racket, a file foo.rkt with:

#lang bar
decl ...

is mostly short for a file

(module "foo.rkt" bar
  decl
  ...)

The "mostly" is because the #lang can also interpose a new parser for non-s-expression languages.

It'd be nice to do this in Klister too. I'd do it be replacing:

data ModuleName = ModuleName FilePath | KernelName KernelName
  deriving (Data, Eq, Ord, Show)

with

data ModuleName = SubmodName ModuleName Text | FileName FilePath | KernelName KernelName
  deriving (Data, Eq, Ord, Show)

allowing imports to ask for submodules.

Things like qualified imports can be attained by prefix-in and the like.

doyougnu commented 1 year ago

after cc687854edf7f2e061e6a132935285d7f4566f9b we get:

circular-2:                                        FAIL
      Test output was different from 'examples/non-examples/circular-2.golden'. Output of ["diff","-u","examples/non-examples/circular-2.golden","/tmp/circular-21773107-53.actual"]:
      --- examples/non-examples/circular-2.golden   2023-03-02 21:36:24.864466802 -0500
      +++ /tmp/circular-21773107-53.actual  2023-03-09 12:44:43.586192090 -0500
      @@ -1,5 +1,5 @@
       Circular imports while importing
         "/home/doyougnu/programming/klister/examples/non-examples/circular-2.kl"
         Context:
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-1.kl"
      -    "/home/doyougnu/programming/klister/examples/non-examples/circular-2.kl"
      +    "examples/non-examples/circular-1.kl"
      +    "examples/non-examples/circular-2.kl"

Which seems to be what we want.

Although the approach is a bit heavy handed because it adds the pwd of klister run or klister repl to World. Why World? because the location of the world should be part of the World. I think this lays a foundation for doing things like :cd in the repl. If that's too much change then please feel free to request more :)

doyougnu commented 1 year ago

If working on Klister's modules, make sure to read "Composable and Compilable Macros: You Want it When?" by Matthew Flatt. Whatever we do should address the concerns in that paper, and importing a design from Agda won't.

onto the reading stack it goes!

doyougnu commented 1 year ago

alright I think I understand the failure now:

why is it bound_vs_free.kl failing?

Because its the only test in failing-examples that import do.kl and bool.kl, no other examples in failing-examples use the import system.

what is the importance of do.kl and bool.kl?

These are modules that are not in stdlib! Rather they are defined in examples so why does is this not found in $KLISTERPATH? According to the comments in the KlisterPath.hs the module import checks:

These are not found because failing-examples is a sub-directory of examples and when module imports look for these modules they are not found in the same directory, then not found in stdlib (because these modules are not in stdlib), and then are not found on $KLISTERPATH. Now I expected these to be found on $KLISTERPATH, but when I print debugged the environment variable it actually was never set in the testsuite at all. So in ef01aaf467e2d28e7c6d3b135bf1b72951825e4d I set $KLISTERPATH in the github CI but these modules were still not found.

Why were these modules not found in $KLISTERPATH even when it was set correctly?

I think this is because findInDirectory is written using listDirectory:

-- KlisterPath.hs
findInDirectory :: String -> FilePath -> IO [FilePath]
findInDirectory moduleName dir =
  map (dir </>) . filter (== moduleName) <$> listDirectory dir

but I have to verify this behavior still because findInKlisterPath looks correct to me:

findInKlisterPath ::
  KlisterPath {- ^ Path to search -} ->
  String {- ^ Module to find -} ->
  IO [FilePath] {- ^ Candidates found -}
findInKlisterPath (KlisterPath paths) moduleName =
  concat <$> mapM (findInDirectory moduleName) (Set.toList paths)
gelisam commented 1 year ago

Does this test succeed locally on your machine but fail on CI, or does it fail everywhere?

gelisam commented 1 year ago

So in ef01aaf I set $KLISTERPATH in the github CI but these modules were still not found

Looks to me like you set it to ${github.workspace} but you should set it to ${github.workspace}/examples. Remember, $KLISTERPATH is a search path for klister modules, it is not the path to the root of the klister repo!

doyougnu commented 1 year ago

Does this test succeed locally on your machine but fail on CI, or does it fail everywhere?

Everywhere. Looks like $KLISTERPATH isn't set locally when I run cabal test

Looks to me like you set it to ${github.workspace} but you should set it to ${github.workspace}/examples. Remember, $KLISTERPATH is a search path for klister modules, it is not the path to the root of the klister repo!

Indeed b1e675050190e14f478b0fc8ce3db6e0a73d29ab works and this does confirm that findInKlisterPath does not search sub-directories. IMHO I think it probably should. Thoughts @gelisam?

gelisam commented 1 year ago

findInKlisterPath does not search sub-directories. IMHO I think it probably should. Thoughts @gelisam?

I don't think it should, no, why? Is there any language which behaves that way?

Bash's $PATH does not recursively search sub-directories:

$ cat foo 
#!/bin/bash
baz

$ cat bar/baz
#!/bin/bash
echo "baz was executed"

$ PATH=`pwd` ./foo 
./foo: line 2: baz: command not found

$ PATH=`pwd`/bar ./foo 
baz was executed

Python's $PYTHONPATH does not recursively search sub-directories:

$ cat foo.py 
import baz

$ cat bar/baz.py 
print("baz was imported")

$ PYTHONPATH=`pwd` python foo.py
Traceback (most recent call last):
  File "/home/gelisam/working/foo.py", line 1, in <module>
    import baz
ModuleNotFoundError: No module named 'baz'

$ PYTHONPATH=`pwd`/bar python foo.py
baz was imported

Java's $CLASSPATH does not recursively search sub-directories:

$ ls bar/
Baz.java
Baz.class

$ CLASSPATH=`pwd` java Baz
Error: Could not find or load main class Baz
Caused by: java.lang.ClassNotFoundException: Baz

$ CLASSPATH=`pwd`/bar java Baz
Hello World!

Ruby's $RUBYLIB does not recursively search sub-directories:

$ cat foo.rb
require "baz"

$ cat bar/baz.rb
puts 'baz was required'

$ RUBYLIB=`pwd` ruby foo.rb
<.../kernel_require.rb>:85:in `require': cannot load such file -- baz (LoadError)
    from <.../kernel_require.rb>:85:in `require'
    from foo.rb:1:in `<main>'

$ RUBYLIB=`pwd`/bar ruby foo.rb
baz was required