Do we need a `#lang`? - Githubissues

dvanhorn commented 1 year ago

Here are some loose thoughts on a potential redesign of the 430 materials to deal with some long standing issues I see.

The bigger issues:

Racket is a big language and folks can go down a rabbit hole learning parts of Racket that aren't really relevant for the class, e.g. for, quasiquote, fancy match patterns, stuff in libraries, etc. Not necessarily bad, but I'd prefer students worked within the small subset that's needed and try to really master that subset. This isn't a class on Racket.
The code people write is atrocious. This is largely a product of UMD's autograder über alles. Perfect solution would be human code review and feedback. But also some language support could help. Maybe we require adherence to a style guide programmatically: require purpose statements, type signatures, consistent formatting, etc.
The lack of type enforcement leads to students spending way too much time debugging programs that have type errors, or worse, coding around those errors.

Some smaller issues:

Since we're committed to programming in Racket, we get the warts too. Transparent structs aren't the right default for us, so we have to explain #:prefab etc.
We also get tied to the Racket semantics, e.g. truish, etc. Would be nice if we could make our own design choices.

There are other issues (the source language and meta language being the same causes confusion, etc.), but a way of dealing with the above issues is creating our own #lang language. That gives us control over much of this stuff.

pdarragh commented 1 year ago

I've thought about this too and yeah, I also think it'd be a good move.

Unneeded Racket features

Overall, I think this is great, and is one of the most compelling factors here. If we do this, it prevents a lot of headaches that we encounter regularly where students have a hard time understanding Racket's functionality. We can choose to just expose a really good, fundamental subset of Racket instead of everything.

The biggest downside of this is that we will become responsible for the documentation. We can no longer just link to Racket's very thorough documentation on whatever thing a student is struggling with. This also is a good skill for students to build: they'll have to learn how to read documentation throughout their careers, so practicing it here is useful. Taking custody of the documentation could be a blessing in disguise, though. Racket's documentation is often too thorough and technical, which means students frequently don't understand it. We could simplify things and explain exposed functionality in a way that is more approachable to our students.

I wonder whether it might be worth having a "passthrough" mode, like a submodule of our #lang where you can deliberately choose to access Racket's functionality. It might even just be a module with (provide (prefix-out racket: racket)) or something. But that way advanced students could use advanced functionality if they really want to, while other students can stick to the reduced subset.

Style enforcement

This would be a huge advantage to us. Although it seems silly and superficial, one of my least favorite things is when a student comes to office hours for help with their code and it's just... absolutely abysmal. It's syntactically correct but they've got unintuitive indentation and weird parentheses and whatnot.

I also like the idea of requiring some notion of intent, but I'm curious what you think that should look like. Like, a type signature is easy and straightforward, but I wonder about other forms of purpose. Maybe expectations of pre-/post-conditions?

Type system

I'm 100% on-board with a type system, but I wonder how advanced of a system we would want. Typed Racket is cool and useful (and already exists), but it's also so expressive that it might be too easy for students to write weird-but-correct types? I wonder if we should restrict to something like Hindley-Milner, where top-level type declarations are required and such.

Truthiness

Although it's a bit "surprising" to students, I actually think we should retain Scheme-style truthiness. I like the idea of forcing students to adapt to a new (to them) way of thinking. This also works well with the first compiler assignment where students have to implement cond, where the truthy aspect is an extra layer of complication that they have to think about.

I do wonder about the type system implementation with regard to truthiness though. Do all types subtype truthy? Or do we have some notion of interfaces/typeclasses with a default truthy implementation?

Other thoughts

A #lang offers us the ability to replace define with something that does some sort of instrumentation for use with the a86 interpreter, or even just annotation in general.
Unless we stick very closely to Racket's semantics, I think a custom #lang means self-hosting becomes trickier, right?
We could replace the default error reporting to give something that's significantly easier for students to work with. If we document common issues, we could even implement Rust-like reporting with links to documentation along the lines of "You did X, but you maybe meant Y or Z. See this doc: ."
I think an info.rkt-like file could be the entry point for compilers implemented in our #lang. This could maybe be where exports from the compiler are listed with their types. Then we could provide this file, and students' code would not run if the expectations were not met. By abstracting out the compiler definition, we give a single file that we can tell students "don't change this or you'll be sad" and it gives them expectations of what is needed.
A custom #lang could also abstract away the Makefile from students, preventing odd configuration/build issues.

dvanhorn commented 1 year ago

On the issue of type systems, a potential solution is the use of contracts. The language forces you to use contracts and the contracts are checked at run-time. When grading solutions, we can check that the contracts are the intended ones. IME random testing and contracts get you 98% of the early error detection that a type system gets you. And we can still have complex types like "integer in (0,256]."

For self-hosting, I think it will actually be the same story. The last language of the class will just be the #lang language. Actually it makes self-hosting easier, because you've been within the subset of what the last compiler can handle all along.

I agree on the truthiness bit. I like that students have to implement things that are different than they might expect. I think what I mean is more like: suppose we want to explore a semantics different from Racket's. We can't really do that, except through interpretation. I think there might be some opportunities for making this easier to do with a #lang.

One thought I had was to implement the #lang not in the usual "elaborate into Racket code" but rather, elaborate into a call to an interpreter written in Racket. (Of course that's still an instance of the former, but you get what I mean.)

So e.g.:

#lang villain
(def foo (x y) (+ x y))

Becomes:

#lang racket
(require villain/interp)
(interp (parse '(def foo (x y) (+ x y))))

The source code for villain-interp.rkt is something that we could open up and modify if we wanted to start playing around with the design of the language.

dvanhorn commented 1 year ago

Elaborating into quote + interp loses all of the nice IDE features such as arrows, renaming, syntax checking etc., but there are ways to fix that.

cmsc430 / cmsc430.github.io

Do we need a `#lang`? #130

Unneeded Racket features

Style enforcement

Type system

Truthiness

Other thoughts