egraphs-good / egglog

egraphs + datalog!
https://egraphs-good.github.io/egglog/
MIT License
459 stars 54 forks source link

Renaming `function` whose output is an E-class to `constructor` #422

Open yihozhang opened 2 months ago

yihozhang commented 2 months ago

To emphasize the differences between functions whose output is an E-class ("constructor" function), which has unique semantics*, and other functions, we decided to change the syntax for declaring functions whose output is an E-class to use keyword constructor. So what used to be

(datatype E)
(function Add (E E) E)

should now be written as

(datatype E)
(constructor Add (E E) E)

Some non-constructors:

(datatype E)
(function best (E E) E :merge new) ;; because there is a merge expression
(function add1 (i64) i64) ;; because output is a primitive (and merge expression is an implicit `panic`)
(function upper-bound (E) i64 :merge (max old new)) ;; because it has a merge function

We should try to support this new constructor without breaking existing programs. So we should still support existing forms like (function Add (E E) E) but issue a warning to prompt the user.

*: constructor functions can be looked up in actions (cf. #420 for non-constructors) where it will be created on the fly (cf. #421 for non-constructors)

oflatt commented 2 months ago

I was under the impression that we were making a breaking change- existing programs must re-name to constructor. (function Add (E E) E) would not be supported. I believe our rational was that egglog is still quite young, so we want to make these changes while we can.

yihozhang commented 2 months ago

Sure, I think there are three paths we can take

  1. Simplify disallowing (function Add (E E) E) in our parser.
  2. Emitting a nicer error for (function Add (E E) E), rejects the program, and suggests the user use constructor.
  3. Emitting a warning for (function Add (E E) E), still accepts the program, and suggests the user use constructor.

Although my original issue suggested (3), I'm fine with either one. Doing (1) is all good, although (2) and (3) are bonus points that don't take a lot of effort but can improve user experience (and can potentially avoid us some Zulip questions about unexpected errors).

oflatt commented 2 months ago

2 would be awesome, it's what I was imagining! A little blurb explaining the difference between the two in an error message would go a long way. Same for the documentation