cucapra / packet-scheduling

MIT License
3 stars 0 forks source link

DSL: Better Error Messages #39

Closed KabirSamsi closed 4 months ago

KabirSamsi commented 4 months ago

Please see #30 to understand the motivation for this PR.

Our DSL programs presently have three major components – let us define an alphabet { $Decl, Assn, Rtn$ }. These correspond to the types outlined in the AST declare, return and assignment.

Moving away from the grammar and into parser land for a second, our goal is to briefly relax the initial rules on parsing a program, such that we can parse improperly formatted programs at an initial stage, and then catch their errors with a more descriptive error handler later on. Let's formalize this:

Suppose we define the regular expression $seq := (Decl + Assn + Rtn)^*$.

We can also define the regex $progseq := (Decl + Assn)^* Rtn$. Finally, we define the regex $prog := Decl (Assn)^* Rtn$.

We can argue that $L(prog) \subseteq L(progseq) \subseteq L(seq)$.

This should be straightforward enough to see without too much pain (do let me know if a stronger justification is required).

This formalism now helps us outline a new approach for parsing programs:

The parser itself parses a series of tokens to a $seq$. This allows for a number of potential problems – misplaced and multiple returns, and interleaved declarations and assignments.

The function validate_seq handles the first issue. Taking in a sequence $p \in L(seq)$, it throws an informative error message in the case that $p$ contains a misplaced return statement, or more than one return. If no such problem is encountered, we can conclude that $p \in L(progseq)$, and accordingly modify the data to fit the type progseq.

The function validate_program handles the second issue. Now taking in $p \in L(progseq)$, it throws another, different informative error message in the case that $p$ contains a sequence of interleaved declarations and assertions (as was the problem initially defined in #30). If not, we can safely conclude that $p \in L(prog)$, and properly parse this to a piece of data of type prog.

Thus, we keep the prior token $\rightarrow$ AST parsing system, but expand the range of programs parseable by our first round, with descriptive error handling.

Other Modifications

Per this discussion, I have flattened the declare, assignment and return variant types, turning them instead into aliases. There are a handful of modifications to parser.mly, util.ml and ast.ml that follow from the changes described above.

Compiler Flexibility In The Future

Currently, any expression not fitting the $prog$ expression or not parseable as type prog triggers an error. This is good. I believe we can make our compiler 'smarter' in the future though, by adding an extra layer of parsing that can turn faulty programs into correct ones. Anshuman and I discussed this, and came to the conclusion that it's better to 'walk' programmers to the correct syntax, rather than giving them optimizations that serve as crutches.

All the same, this functionality keeps that option open. As an example, if a programmer writes consecutive class declarations, our compiler could re-express this as one merged list of class declarations. If one interleaves the two, our compiler could find a way to prise them apart!

Something like compiler warnings rather than errors might be of interest later – by no need to worry about that too much just now. I think for the time being, the conclusion we came to is definitely best.

KabirSamsi commented 4 months ago

@polybeandip I like your suggestion of moving the parse validation stuff to parser.mly! And you're right, probably don't need the type synonyms. However, I think we should keep util.ml as is – we will probably use it for output/pretty printing etc. as well.

KabirSamsi commented 4 months ago

Update after discussion is that we are creating a separate parse.ml file to store parsing functions, and util.ml (presently empty) can handle things like pretty-printing. Committed now.

KabirSamsi commented 4 months ago

Comments handed & resolved, merging now.