Open evincarofautumn opened 10 years ago
Having spoken with you about this in irc, I'm left with a couple remaining questions.
I'm ok with not having strong types during the first phase. With that said I do worry that it'll make error reporting more complex (and unusable a la C++ templates) when we get to the second phase. This may not actually be the case, but we should strive to make error reporting equally good/bad in both stages.
Without knowing more about Kitten's generics, it's tough to envision this code in use. For example, I'm thinking about how to go about implementing the analogous std::numeric_limits<T>
using phase 1 constructs. Alas, this requires extensive usage of specialization: something not yet supported. So what about a higher order function? Also tough for me to say.
If these two systems (generics and CTE) aren't designed together, they'll surely affect each other after the fact. I believe we won't get a good CTE system without know how we want to use it for generics.
How about mutability during the first phase?
Mutability is another topic. It will be available at compile time if/when Kitten has it generally.
What about calling functions from a CTE which are not yet linked?
At first, we should require that source be available for definitions used in a CTE, much like C++ templates. We can relax that restriction in the future by compiling the phase-1 program, rather than interpreting it. That could improve compile times, as well, since the interpreter is not fast. However, I’m not sure it should be the default, because it has a lot of moving parts. In particular, unless/until we have a native backend, it introduces an interphase dependency on an external C compiler.
We will need ways of interacting with phase 2 entities. Which of these will be provided out of the box?
Basically, we should provide a Kitten transliteration of the compiler’s internal representation of terms (the TrTerm
data type) and allow people to pattern-match on that however they choose. Then there might be a set of convenience functions, such as:
define vectorElements (term -> [term]?):
match:
case MakeVector -> terms:
terms some
default:
none
I will need to think about the interaction between generics and CTE more. C++ templates conflate them, and that turned out to be a mistake which they’re only now trying to rectify with constexpr
. I think I may be able to find an elegant implementation of generics in terms of CTE.
It will be available at compile time if/when Kitten has it generally.
That's an interesting idea. Imagine a compile-time snippet that downloads some code, loads it from a file, and puts it into the binary. The package manager could be a compile-time step. hah!
I'm certainly more interested in the types of refactoring we could do at compile-time. This opens the door for some interesting reflection-based serialization or similar techniques.
For security reasons, I/O can’t be allowed at compile time by default. But it would be nice to be able to say define version { #("git describe --long --tags --always" system) }
. Automatic serialisation would also be a good use case.
One problem with this is that it makes cross-compilation harder: you need to run compile-time code on the target, not on the host.
That's a good point. I'd imagine that emulating the target's environment (pointer size, for example) would be possible, but likely not worth it. My guess would be that most uses of CTE won't be platform-specific.
Another idea, which is crazy, is to, when cross-compiling, not generate the final binary. Instead, generate a binary which, which executed, runs the appropriate CTEs and then does codegen. Effectively, curry the compilation process to defer CTE and codegen to when the binary is first run (likely by the package installer) in the target environment.
That’s not so crazy—we could ship bytecode with a compiler stub. A simpler and more portable (albeit slower) approach would be a threaded code backend.
Isn't this basically the same idea as Forth's compiler mode? Maybe you can take some ideas from that. https://www.forth.com/starting-forth/11-forth-compiler-defining-words/
They’re pretty different—both are metaprogramming systems, but Kitten’s is a macro system, in that compile-time code will operate on a representation of the AST, while Forth defining words operate on the compiler state.
I see. But could it still have a similar feel? Is all the excess quoting syntax really necessary?
#('[1, 2, #(1 + 2)])
It seems Lispy and clunky IMO (as macro syntaxes often do).
Instead can whatever is left on the compile time stack within a given CTE block get spliced into the AST. So the above could simply be
#([1, 2, 1 + 2])
You would still need a way to quote function words though, as in your first example with dup
. Perhaps you could put the quote mark on the backside so it would read like a prime symbol.
#(dup' 3 replicate)
I would still want to maintain a distinction between terms and their representations, to avoid too much magic, but that’s a good idea to avoid quoting/splicing hell. In the same vein, we could treat the foregoing terms as a stack, so for instance, (f call) #dup
would give (f call) (f call)
, and you would invoke a typed printf
macro as just "…" #printf
.
I would like to split compilation into two phases:
My current thinking is to have a word or sigil that denotes CTE, and allow it to be applied to any term—or at least those terms using words for which source is available. I propose
#(term)
to denote phase 1 and'(term)
to denote phase 2—ignoring character literals for the moment. For example, the term#(1 + 2)
in phase 1 is exactly equivalent to the literal3
in phase 2, and#('3)
is another way to spell that.Broadly, there are three salient issues in my mind: the generation of code, the generation of types, and the role of the compiler.
It should be possible to write a phase-1 definition
times
which accepts the symboldup
and the value3
, and produces the phase-2 expressiondup dup dup
. This implies that a phase-1 expression should be able to produce a vector of terms which are spliced into the phase-2 program.Maybe
#['3]
denotes splicing only one term, as shorthand for#(['3])
.When generating code, you want the ability to quote and splice, i.e., to switch phases.
You want the ability to generate types, so that you can compute interesting dependent types, and produce type signatures for code that you generate. For instance, a function such as
printf
may take an arbitrary number of arguments determined by its format string. You should be able to express that#("%d" printf)
denotes a function that accepts one integer argument.Rather than splice at every call site, it may be desirable to write definitions where some of the arguments are passed at compile time, i.e.,
(1 + runtime junk) #"%d" printf
. I don’t know exactly what type signatures for such functions should look like. #132 has some thoughts on this.What’s less clear is whether the AST you generate should be typed. Users of Template Haskell tend to lament the fact that you can produce nonsensical ASTs, but Haskell’s ASTs are also significantly more complex than Kitten’s. I think it should be untyped, that is, every term you manipulate in phase 1 should have type
term
(orexpr
, orbikeshed
) regardless of its phase 2 type. Phase 1 code would still be statically typed, so#(1 + "two")
would not compile.The defining words such as
define
anddata
could be modelled as phase-1 functions, which happen to have the side effect of manipulating definitions and types used during phase 2. That is,define
would be a word in the compiler API, and a phase-1 program would be a Kitten program that happens to use the compiler API to produce a Kitten program compiled in phase 2. I’m still pretty fuzzy on the specifics of how to do this, and whether it’s worthwhile.CC @jeaye