effekt-lang / effekt

A language with lexical effect handlers and lightweight effect polymorphism
https://effekt-lang.org
MIT License
314 stars 19 forks source link

Top-level constants in the LLVM backend #496

Open jiribenes opened 3 months ago

jiribenes commented 3 months ago

Motivation

The LLVM backend currently does not support top-level let bindings of any kind. As a first step, it would be nice to at least support top-level bindings of literals that don't use any capabilities (what I'll call "top-level constants" here), like for example:

val myString = "Hello, world!"
// ^^^
// [error] Toplevel def and let bindings not yet supported: Let(myString,Literal(Hello, world!,Data(String,List())))

def main() = {
  println(myString)
}

Existing use in the standard library

We even use top-level constants in the standard library in the string module: https://github.com/effekt-lang/effekt/blob/acbec872ae6190bd9662d1eaccc1b00342109851/libraries/common/string.effekt#L206-L227

which are then used by the test module: https://github.com/effekt-lang/effekt/blob/acbec872ae6190bd9662d1eaccc1b00342109851/libraries/common/test.effekt#L79

Unfortunately since the LLVM backend doesn't support these top-level constants, one may use neither ANSI escape codes nor, by transitivity, the test module. :(

Implementation details

Based on my Clang-based testing, I think it should be enough to do the following to declare a top-level Pos constant:

@topLevelPos = constant %struct.Pos { i64 42, ptr null }, align 8

The biggest problem might be that we'd need to add these top-level constants to the Machine representation as that's where the current error is being thrown from: https://github.com/effekt-lang/effekt/blob/acbec872ae6190bd9662d1eaccc1b00342109851/effekt/shared/src/main/scala/effekt/machine/Transformer.scala#L39-L47 (One could start by pattern-matching on something like (rest, lifted.Definition.Let(constantName, lit@Literal(_, _)))...)

Technically, we should also be able to instead inline all of these top-level constants wherever they appear, but that's quite ad-hoc.

phischu commented 2 weeks ago

Yes, we should do it. The hardest part is finding the difference between top-level variables and local variables. Consider the following program:

val myString1 = "Hello, world 1! "

def main() = {
  val myString2 = "Hello, world 2!"
  println(myString1)
  println(myString2)
}

The occurence of myString1 and myString2 look the same to the compiler, but operationally they are vastly different. My strategy would therefore be to transform the program into the following:

def myString1() = "Hello, world 1! "

def main() = {
  val myString2 = "Hello, world 2!"
  println(myString1())
  println(myString2)
}

This should work for constants of all types and even for pure and effectful expressions. There is no code duplication. The use of a top-level constant has computational content (since it has to be moved from the data section onto the heap), so philosophically this makes sense. For small stuff like this, llvm will surely inline.

What do you think? How and where would you make this transformation?

b-studios commented 2 weeks ago

What about (in LLVM, not source or core, obviously):

val myString1;

def main() = {
  myString1 = "Hello, world 1! ";
  val myString2 = "Hello, world 2!"
  println(myString1)
  println(myString2)
}

?

b-studios commented 2 weeks ago

The problem remains in LLVM that local variables are %x and globals are @y and differently accessed, right?

b-studios commented 2 weeks ago

If you REAAALLY want to have functions instead of global variables (which I don't understand really, why), you could cache computation:

val _myString1;
def myString1() = {
  if (_myString1 == null) { _myString1 = "Hello, world 1! " };
  _myString1
}

def main() = {
  val myString2 = "Hello, world 2!"
  println(myString1())
  println(myString2)
}
jiribenes commented 2 weeks ago

The occurence of myString1 and myString2 look the same to the compiler, but operationally they are vastly different. My strategy would therefore be to transform the program into the following:

Forgive me, but I don't quite see the problem here as long as we're only talking about string/integer/... constants and immutable variables. Specifically, why couldn't we do the following for constant strings/integers/... in immutable variables always:

@topLevelPos = constant %struct.Pos { i64 42, ptr null }, align 8

This corresponds to plain old code motion of constants, which should Just Work :tm:, right?

b-studios commented 2 weeks ago

Maybe I want too much, but I am thinking of arbitrary (maybe pure) toplevel computations

jiribenes commented 2 weeks ago

Sure, as an end goal, I'd like to have that too. But if there are any doubts and disagreements on what to do, I'd really prefer to have support for these top-level constant literals soon, I'm running into this small — and probably not-that-difficult-to-implement — subset all the time (off the top of my head: ANSI escape codes in test in stdlib, size/rotation-related constants for the upcoming TreeMap)