carp-lang / Carp

A statically typed lisp, without a GC, for real-time applications.
Apache License 2.0
5.52k stars 174 forks source link

Refactor XObj pattern matches #871

Open scolsen opened 4 years ago

scolsen commented 4 years ago

Currently we pattern match against XObj nearly everywhere throughout the codebase. This abundance of pattern matching and the ubiquity of XObj comes with pros and cons:

Pros:

Cons:

There is really only one con, but it's a big one. Here's why, imagine we want to extend the definition of Obj with a new form/constructor New Int. Since we match on XObjs, not Obj directly, a level of indirection allows us to flexibly extend Obj without updating our pattern-matches. That's great, but any existing function that pattern matches on XObj needs to be updated to handle the new form (if necessary). Not to mention, since we match on XObj the compiler won't force our functions to be exhaustive over Obj which has already led to non-exhuastive patterns in x errors at runtime.

Not to mention, aside from occasional use of as patterns, we don't benefit much from the pros of pattern matching. The readability offered by matching begins to disappear once we have nested case statements. The equational reasoning benefits are not really applicable for us. In most cases we're only interested in one component of an Obj in a given function, leading to a ton of extra matches over _ which is somewhat noisy.

Not to mention, pattern matching everywhere also greatly hampers our ability to modify existing Obj constructors, leading to a proliferation of new constructors in some cases. Imagine, for instance, that we discovered some benefit to changing Mod so that it carries, in addition to its Env the SymPath that designates the Mod, Mod Env SymPath. In order to make such a change, we'd have to update every single pattern match on Mod throughout the entire codebase.

Another way

We can recover the benefits of encapsulation by doing two things:

If we use this approach, every pattern match would turn into a guarded statement, as an example, here's what expandList might look like:

-- assume this is still within the context of `expand`
expandList xobj | isList xobj = expandListInternal $ listObjs xobj
                | otherwise = error "can't expand nonlist"
expandListInternal xobjs | isExternal hd = return (ctx, Right xobj) 
                         | isInstantiate hd = return (ctx, Right xobj)
                         | isDeftemplate hd = return (ctx, Right xobj)
                         ...
                         | isDefn hd = do expand eval ctx $ fnBody xobjs.....
                         | otherwise = error ...
                         where hd = head xobjs

This not only eliminates tons of required but unnecessary matches (all those _ in our XObj matches) but it is far more independent to potential changes to Obj and its. constructors, and the otherwise cases handle new additions flawlessly. It's also easier to read in my opinion.

The downside of course, comes in the form of extra cost of having to know what all the predicative functions and accessor style functions mean isDeftemplate, fnBody... etc. --but I personally find the regained encapsulation to be worth it. Currently I'm reluctant to make any major or essential changes to the compiler as it's difficult to anticipate all the possible locations one is impacting. I think encapsulating Obj from the rest of the codebase will help significantly.

Let me know what you think!

Another alternative

There's also no requirement that we be absolute about this. We could still expose Obj constructors, allowing us to pattern match, while additionally providing predicate and accessor functions so that we may mix and match uses of pattern matching and guarded forms--that way we can pattern match when it's convenient, but use guards where it makes more sense to do so (e.g. places where the match would have tons of _).

jacereda commented 4 years ago

I find the current code a bit hard to read. Maybe there's some middle ground using pattern synonyms? https://kseo.github.io/posts/2016-12-22-pattern-synonyms.html

scolsen commented 4 years ago

That definitely would help! https://gitlab.haskell.org/ghc/ghc/-/wikis/view-patterns may also be useful--we'd be able to change many of the deep case statements to direct matches against function arguments.

My only reservation with using extensions is that they introduce more syntax and special structures that we rely on--meanwhile using guards and functions are just core facilities of the language.

hellerve commented 4 years ago

I’ve definitely wished for a better, more expressive way of constructing these switches before, but hadn’t settled on a solution I found elegant and practical. I think the approach provided originally looks promising for sure.

I do have one little caveat here, though: it doesn’t seem to me like using guards and checks would make checks for exhaustiveness any better, right? I feel like an ideal solution would force us to consider all possible cases all the time, but that might be a pipe dream.

jacereda commented 4 years ago

Exhaustiveness is the reason why I suggested pattern synonyms. It would just be a more concise way to express the patterns.

scolsen commented 4 years ago

I do have one little caveat here, though: it doesn’t seem to me like using guards and checks would make checks for exhaustiveness any better, right? I feel like an ideal solution would force us to consider all possible cases all the time, but that might be a pipe dream.

ah, you are right about that—for some reason I thought GHC would complain about guards without a case but apparently it doesn’t.

to eliminate the non-exhuastive match runtime errors apparently we just need to compile with Winncomplete-patterns which is included in -Wall

as for readability and encapsulation, I’m of the opinion that hiding the implementation detail of Obj behind functions gets us the greatest amount of encapsulation and modularity, but I think @jacereda’s named patterns suggestion gives us the greatest benefit for the least amount of work—it would also be very similar to using functions, the only difference being that functions may be composed etc. (which likely wouldn’t be beneficial that often but it would allow us to express matches over nested constructors easily (e.g. to match the first qualifier in a sympathhead . getQuals . getSympath

I’m open to either named patterns or functions as a solution to this

eriksvedang commented 4 years ago

I totally agree with your problem assessment @scolsen. I've been meaning to look into pattern synonyms for this exact reason (and also recursion schemes, which would be a third possible way to do this, but possibly the hardest and most intrusive).

(I do think there's a performance cost to the guard solution too, though that is purely anecdotal. I've just read somewhere that GHC produces the best code when you pattern match "at the top level". We often have inner case matches though, so this is somewhat of a moot point perhaps.)

All in all I think that named patterns would be a great thing to try out, ideally just in one module to get a feel for it before going all-in.

scolsen commented 1 year ago

So, we have pattern synonyms now, but I think there are still some cases we haven't added yet.