JacquesCarette / Drasil

Generate all the things (focusing on research software)
https://jacquescarette.github.io/Drasil
BSD 2-Clause "Simplified" License
141 stars 26 forks source link

Implementing extensible expression language #2743

Closed balacij closed 3 years ago

balacij commented 3 years ago

As a component of #2646

This is originally from an email I sent to Dr. Carette:

I've been trying to implement the "Datatypes à la carte" styled Expr/ModelExpr language for a while now, but I've been struggling to get pattern matching to work well when merging languages, and I've also been struggling to get a typed version of it working.

I found Sylvain Henry's continuation of the "Datatypes à la carte" work. He built a very similar system that relies on encoding the extensible sum types using lists of types (called "Variants"), where expressions of the language can take on any value of type X, where X is an element of the list of types designated. He then creates a variant using recursive sum types, naming these variants "Extensible ADTs" (EADTs).

He has a lot of documentation on EADTs: https://docs.haskus.org/eadt.html

And a very nice introduction/background page: https://docs.haskus.org/eadt/background.html

I think Henry's approach is interesting. One issue that I had with the "Datatypes à la carte" approach I had is in creating values that contain type information for a whole language feature set (e.g., everything in Expr + everything in ModelExpr) when the term contained only one features from one of the languages. I believe this issue would occur if a TheoryModel expects ModelExprs but some instance of a TheoryModel ends up being simple and ends up with something that only relies on things in Expr. Henry's approach appears to have no issues with this at all.

He also builds multiple libraries to support his work: https://hackage.haskell.org/packages/search?terms=haskus

Specifically, the library containing EADTs, and supporting Variants: https://hackage.haskell.org/package/haskus-utils-variant

I've been able to get pattern matching working nicely with EADTs (admittedly, his library and documentation makes a lot of the work straightforward). However, I haven't been able to get far in getting it typed, and I haven't been able to find any EADT-related resources online. The issues lie in missing instances for particular typeclasses in his library that I haven't looked into enough yet.

I have 2 more documents lined up for me to read that might be helpful for adding static typing via GADTs using the pattern functor/"Datatypes à la carte" approach:

https://www.researchgate.net/publication/228841104_Compositional_data_types

http://www.timphilipwilliams.com/posts/2013-01-16-fixing-gadts.html However, I'm a bit concerned that the same pattern matching/deconstruction issues will occur (I wonder if we can merge the efforts?).

Aside: Henry has an interesting example of dynamic type checking: https://hsyl20.fr/home/posts/2018-05-22-extensible-adt.html#type-checking-example-paramorphism-example

Do you have any thoughts/advice on how I should proceed or what I should prioritize?

Specifically, do you think I should continue with building a prototype Expr/ModelExpr language using the "Datatypes a la carte" (or EADT style) despite not yet being able to create a typed variant of it? Though, these 2 papers lined up might resolve this issue.

JacquesCarette commented 3 years ago

I'm not entirely sure that going to "Datatypes a la carte" (original or modified) is actually the right thing for us to do. I did think it was worth experimenting with - maybe it would have worked smoothly.

So we need to step back and re-evaluate:

  1. what problem is this seeking to fix?
  2. does it look like the solution introduces worse problems than the one being supposedly fixed?

Let's really figure that out first. And maybe do some small-scale experiments, rather than do them 'live' on Drasil.

balacij commented 3 years ago

Thanks!

Let's really figure that out first. And maybe do some small-scale experiments, rather than do them 'live' on Drasil.

I think both are great ideas. Let's do that.

Background information:

We started off with the long-term goal of adding typing to the expression language.

After splitting up some terms to get pseudo-"type checking" working (nothing statically checked yet), we realized that we needed to split up the expression language because it contained functionality that was usable "at some level" but completely unusable "at another level". The focal motivating operation was "definition". We wanted to be able to say that "some data is defined by some expression" without abusing the =​ operator (where = :: Eq a => a -> a -> Bool). So, we started off with building DisplayExpr​, containing only a wrapped Expr​entry to DisplayExpr​ typing, and Defines,​as an operation.

Aside: Thinking about this again, it might have been the wrong choice -- a better solution may have been to create a way (typeclass like Display) for things to directly transcribe their mathematical meaning into the printing AST language. Afterwards, "things that define things" (such as QDefinitions) would directly transcribe their mathematical meaning using the existing 'equals' Printing AST operator instead of having a pass-through in DisplayExpr. Alternatively, since we previously had more RelationConcepts and less ModelKinds, it may have been the case that it was more convenient to "print Exprs" in drasil-printers because there was no ModelKinds to use to print out information for (e.g., it might be best to move the printing logic completely to drasil-printers, instead of placing some printing logic [such as "Defines"] in DisplayExpr/ModelExpr via the Display​typeclass on those models). What do you think about this?

If we chose to keep it as is, we would need to add runtime type checking (Drasil runtime, not software artifact runtime) when constructing (and processing) expressions, or else code generation would run the risk of complete failure, generating invalid expressions, etc. Even then, I still think we would need to restrict components of the expression language to only certain areas where it can be interpreted properly (such as differential equations to DEModels).

Continuing, we noticed that we had a few other expression components that were, at the time, solely displayed, never used in code generation. So, we decided to also move these operations into DisplayExpr​. These expressions were generally simple: "x IsIn Space", and "multiple things are equivalent".

Afterwards, we noticed that there were components of the expression language that were too specific to "the code"/GOOL, namely "New", "Message", and "Field". So, we created a one-to-one copy of Expr​ called CodeExpr​and removed the 3 expression components from Expr​. We decided to make it a one-to-one copy because we didn't know if like-terms would need to be typed differently in Expr vs CodeExpr. However, if there ends up being no difference, we could (and should) re-write CodeExpr, removing the components already contained in Expr and ensuring the two languages are encoded such that they are interoperable (e.g., TTF, pattern functor/Datatypes à la carte, etc).

Later, we found more operations (Deriv, integration, and a few others) in CodeExpr that weren't directly translatable into GOOL yet, we also noticed the typing for them would be odd in Expr to start, so we decided to move them into DisplayExpr (at least until code generation caught up to speed). Ultimately, it ended up that we wanted the "generally first-class easily computable operations" (e.g., basic arithmetics) to stay in Expr​, rename DisplayExpr​to ModelExpr​, and then have ModelExpr​be an interoperable extension of Expr​. With this, the vocabulary of Expr​would be guaranteed to be safely generable and usable in CodeExpr​. These other operations (in ModelExpr​) carry non-trivial information in performing, analyzing, and generating code for them. As such, carrying them in their own container with extra surrounding information about them might make things easier for code generation. For example, we should know about continuity, infinities, and undefined forms (well, realistically, we should know about undefined forms like 0/0 too). With ModelExpr​being separate from Expr, we can impose static type checking restrictions on which expressions can go where. We also recognize that these things in ModelExpr can be type checked too (albeit not as much as we'd like to [dependent types?] but at least loosely), so having them in ModelExpr might also be another good motivation (separating strong static typing in Expr vs loose static typing in ModelExpr).

Finally, we can answer the 2 questions:

  1. The greater problem is that we want to form expressions in places that make sense (both to us, logically, and to a computer, "computably"). Since we're using Haskell for Drasil, we want the expression language to be statically typed to restrict the vocabulary as much as possible, and, where it fails (e.g., where we might want dependent types), for expressions to be restricted to areas where they can be sensibly interpreted and checked.
  2. The intended solution may be sub-optimal in some regards, but I can't think of any issues with it that make it worth not doing.
JacquesCarette commented 3 years ago

What I don't get is why the topic is extensible expression language? In all the discussion above, extensibility doesn't seem to really come up. Yes, we kind of know we'll want some kind of extensibility, but it's more like a secondary rather than primary feature.

In other words, I'm still not entirely sure that we agree on what problem needs to be solved. It's impossible to evaluate the comparative benefits of various potential solutions without a clear statement of the problem.

At the same time, we need to also pin down what each data-structure (Expr, ModelExpr, etc) can represent. For example, I see Expr as things that are "values". A "defines" is most definitely not a value, it is a kind of statement, as it modifies the current namespace.

As an approximation:

So if we need to do things that introduce new names, those would be new language(s) that sit 'above' these.

There's a whole lot more to this issue than just the bits above, but these parts need to be decided upon first, before any of the rest of the discussion can happen.

balacij commented 3 years ago

The topic is partially related to extensible expression language because I was attempting to have the 3 languages be interoperable through them all being based on Expr.

I agree, it's definitely not the primary issue (maybe the attempt was jumping ahead), and we should have a clear problem statement. If we were to give a priority to the 'extensible' component of the expression languages, it would be likely be a "conditional secondary", depending on Expr being a common expression language between ModelExpr and CodeExpr.

Regarding the 3 data structures, I believe we have the same definitions. To be clear, I would just want to note that, to my understanding, components of ModelExpr, would also be a frontend to components that isn't aren't adequately represented using 'just' Expr such that it's usable for reasoning and code. These components would require extra information around them to make them usable in code generation in particular (e.g., DEs).

JacquesCarette commented 3 years ago

So I think the simplest way forward will involve duplication between all 3. Then we can refactor to use functors (i.e. ExprF) and fix points of things to do sharing.

I don't understand what you mean by "frontend to components".

Also, we have multiple layers of information capture. Some of it is through the 'expression' DSLs, but also there are the "quantities" hierarchy of structures. So it could be that a DE only shows up inside a model and never 'naked'. So that would mean it doesn't need to appear in Expr or ModelExpr at all.

balacij commented 3 years ago

As an immediate step, does that mean we would want to rewrite DisplayExpr, renaming it as ModelExpr, and making it a 1-1 clone of Expr + the extra functionality (as we did with CodeExpr)? Afterwards, we would refactor them again using the functor representation.

Regarding "frontend to components", I was trying to say that some components of ModelExpr would be just "for show" unless they were paired with other information (likely inside a model) such that we can use it. It could also be the case where it wouldn't appear in ModelExpr at all.

JacquesCarette commented 3 years ago

First para: yes.

Second para: ok.