cmsc430 / cmsc430.github.io

CMSC 430 Design and Implementation of Programming Languages
https://cmsc430.github.io/
45 stars 33 forks source link

Do we need a `#lang`? #130

Open dvanhorn opened 1 year ago

dvanhorn commented 1 year ago

Here are some loose thoughts on a potential redesign of the 430 materials to deal with some long standing issues I see.

The bigger issues:

Some smaller issues:

There are other issues (the source language and meta language being the same causes confusion, etc.), but a way of dealing with the above issues is creating our own #lang language. That gives us control over much of this stuff.

pdarragh commented 1 year ago

I've thought about this too and yeah, I also think it'd be a good move.

Unneeded Racket features

Overall, I think this is great, and is one of the most compelling factors here. If we do this, it prevents a lot of headaches that we encounter regularly where students have a hard time understanding Racket's functionality. We can choose to just expose a really good, fundamental subset of Racket instead of everything.

The biggest downside of this is that we will become responsible for the documentation. We can no longer just link to Racket's very thorough documentation on whatever thing a student is struggling with. This also is a good skill for students to build: they'll have to learn how to read documentation throughout their careers, so practicing it here is useful. Taking custody of the documentation could be a blessing in disguise, though. Racket's documentation is often too thorough and technical, which means students frequently don't understand it. We could simplify things and explain exposed functionality in a way that is more approachable to our students.

I wonder whether it might be worth having a "passthrough" mode, like a submodule of our #lang where you can deliberately choose to access Racket's functionality. It might even just be a module with (provide (prefix-out racket: racket)) or something. But that way advanced students could use advanced functionality if they really want to, while other students can stick to the reduced subset.

Style enforcement

This would be a huge advantage to us. Although it seems silly and superficial, one of my least favorite things is when a student comes to office hours for help with their code and it's just... absolutely abysmal. It's syntactically correct but they've got unintuitive indentation and weird parentheses and whatnot.

I also like the idea of requiring some notion of intent, but I'm curious what you think that should look like. Like, a type signature is easy and straightforward, but I wonder about other forms of purpose. Maybe expectations of pre-/post-conditions?

Type system

I'm 100% on-board with a type system, but I wonder how advanced of a system we would want. Typed Racket is cool and useful (and already exists), but it's also so expressive that it might be too easy for students to write weird-but-correct types? I wonder if we should restrict to something like Hindley-Milner, where top-level type declarations are required and such.

Truthiness

Although it's a bit "surprising" to students, I actually think we should retain Scheme-style truthiness. I like the idea of forcing students to adapt to a new (to them) way of thinking. This also works well with the first compiler assignment where students have to implement cond, where the truthy aspect is an extra layer of complication that they have to think about.

I do wonder about the type system implementation with regard to truthiness though. Do all types subtype truthy? Or do we have some notion of interfaces/typeclasses with a default truthy implementation?

Other thoughts

dvanhorn commented 1 year ago

On the issue of type systems, a potential solution is the use of contracts. The language forces you to use contracts and the contracts are checked at run-time. When grading solutions, we can check that the contracts are the intended ones. IME random testing and contracts get you 98% of the early error detection that a type system gets you. And we can still have complex types like "integer in (0,256]."

For self-hosting, I think it will actually be the same story. The last language of the class will just be the #lang language. Actually it makes self-hosting easier, because you've been within the subset of what the last compiler can handle all along.

I agree on the truthiness bit. I like that students have to implement things that are different than they might expect. I think what I mean is more like: suppose we want to explore a semantics different from Racket's. We can't really do that, except through interpretation. I think there might be some opportunities for making this easier to do with a #lang.

One thought I had was to implement the #lang not in the usual "elaborate into Racket code" but rather, elaborate into a call to an interpreter written in Racket. (Of course that's still an instance of the former, but you get what I mean.)

So e.g.:

#lang villain
(def foo (x y) (+ x y))

Becomes:

#lang racket
(require villain/interp)
(interp (parse '(def foo (x y) (+ x y))))

The source code for villain-interp.rkt is something that we could open up and modify if we wanted to start playing around with the design of the language.

dvanhorn commented 1 year ago

Elaborating into quote + interp loses all of the nice IDE features such as arrows, renaming, syntax checking etc., but there are ways to fix that.