Frege / frege

Frege is a Haskell for the JVM. It brings purely functional programing to the Java platform.
https://github.com/Frege/frege/wiki/_pages
Other
3.64k stars 144 forks source link

Get rid of sum-record types in the compiler #382

Open matil019 opened 4 years ago

matil019 commented 4 years ago

This is a discussion about the internal of the compiler, not the syntax of the language.

Currently, the compiler is implemented by heavy uses of a combination of a sum type and record syntax, or so-called sum-record types, which are in the form of:

data SumRec = Foo { foo :: Int } | Bar { foo :: Int, bar :: String }

the examples are DefinitionS, SymbolT and QName.

Sum-records are unsafe constructs because it encourages writing partial functions. SumRec.bar in the example above is partial because it throws a runtime error if a value of SumRec is not constructed with Bar.

Frege comes with a handy syntax SumRec.{bar?} which queries the existence of the field and returns a Bool. However, I find that it makes difficult to manage the codebase because it introduces an "implicit contract"; the existence of a particular field, which is invisible to its type so not checkable by a compiler, gains some meaning.

I stumbled across this problem when I (quite a while ago) tried adding type families to Frege. I figured that I should add two data constructors SymTF for a type family and SymTI for a type instance to SymbolT but ended up getting a lot of runtime errors by accessing missing fields.

The compiler has a lot of unchecked access to partial fields and redundant error cases (see the links for examples). Currently the compiler doesn't trigger any runtime errors, which are good thing, and I believe the author(s) at the time of writing were very well aware that erroneous field accesses never happened. But that fact is not visible in the source code and becomes a problem if someone (including me) who doesn't know those invisible contracts tries to modify the code, he or she can unknowingly break the compiler by introducing runtime errors.

It's true that one can infer the possible constructors by reading through related parts of the codebase, but it shouldn't be necessary. It's especially hard to track a pair of enter/changeSym and Global.findit. With the help of the type system of Frege, much of (if not all) that effort would be eliminated.

So I intend to refactor the compiler not to use sum-record types. SymbolT for example will be written to:

data SymV g = SymV { ... }
data SymT g = SymT { ... }
...

data SymbolT g
  = protected V (SymV g)
  | protected T (SymT g)
  | ...

Thanks to Frege's protected constructors, the sum type constructors can have concise names.

Please see my branches (no-sum-definitions, no-sum-symbolt) for the current refactoring work. The code is deliberately written in a way so that compilation warnings for unsafe field acceses are emitted as many as possible.

Ingo60 commented 4 years ago

This would probably be a good thing, except for the extra indirections that are introduced. But I guess the refactoring is a mess.

matil019 commented 4 years ago

Thank you.

The indirection degrades the performance a little. Comparing the running time of make clean compiler1 with fregec.jar of master and that of the refactored one, the latter took 0.9% more time on my machine. I hope the additional safety provided by this change enables more aggressive optimization (by programmers) for overall better performance.

The refactoring is indeed huge, but fixing compilation errors, the most trivial and tedious part, is already done. I tried to minimize the diff by changing only where needed. I'm examining remaining pieces of code that requires understanding of semantics.