Add DML support for multi-word attributes/names and properties

HPInc / HP-Digital-Microfluidics

HP Digital Microfluidics Software Platform and Libraries

MIT License

2 stars 0 forks source link

At the moment, DML supports a fixed set of multi-word names and attributes, and it separates expressions from attributes by using . or 's. It would be nice if it was possible for users to define their own multi-word names, e.g.,

tell user to load = macro(reagent) { ... };
ready to run(int phase) -> bool { ... };

The problem with this is that there are a bunch of expr rules that follow an expr by a keyword that could be in a name:

| quant=expr 'as' 'a'? 'string' 'in' dim_unit # unit_string_expr
| val=expr existence # existence_expr
| start_dir=expr 'turned' turn # turn_expr
| dist=expr 'in' ('dir' | 'direction') d=expr # in_dir_expr
| amount=expr dim_unit # unit_expr
| amount=expr 'per' dim_unit # unit_recip_expr
| amount=expr 'C' # temperature_expr
| vol=expr 'of' which=expr # liquid_expr
| obj=expr possession ('a' | 'an') attr # has_expr
| obj=expr ('is' NOT? | ISNT) pred=expr # is_expr
| lhs=expr 'and' rhs=expr # and_expr
| lhs=expr 'or' rhs=expr # or_expr
| first=expr 'if' cond=expr 'else' second=expr # cond_expr

I'd also like to be able to have a basic expr property rule that would allow you to say things like

e writes CSV files
e doesn't write CSV files
e writes CSV files = false

with general notions of inferring negative forms. My basic idea is that properties like that are necessarily boolean (and don't take parameters, although I may want to rethink that), which means that they can't be strung together. Also, they would necessarily start with a positive auxiliary (e.g., is, can, has, does, will, must, should, may) or a third-singular verb. I think I'm willing to start by saying that there are a fixed number of these, e.g., writes, reads, sends, counts, needs, and they all negate with doesn't.

To do this in general, I'll probably have to have some sort of secondary grammar rules for some or all of the above rules to pull long sequences of words apart, but even that might not work. For example, something like n uL of lookup(k) will wind up parsing as n uL of lookup, (, k, ), which won't have the right structure.

If I'm willing to simply go for multi-word names and boolean properties (as constants, not computed), and I'm also willing to rule out keywords and keyword sequences that would make things ambiguous, I think it may be doable.

Migrated from internal repository. Originally created by @EvanKirshenbaum on Jun 05, 2023 at 2:22 PM PDT.

Let's focus on attributes to start. They should be easier to deal with because

they are always syntactically marked, being separated from a preceding expression by 's, ., has a(n), or doesn't have a(n).
they always end an expression or are followed by exists, doesn't exist, or =, so we don't have to worry about them running up against another expression start.

Ideally, we should be able to simply do something like

attr
  : attr_start (attr_middle* attr_end)?
  ;

where for each class we specify the words that can be used.

attr_start is pretty much any identifier except for maybe (and we could even allow that and check for it in the attr_expr rule. This includes ID and all keywords, including types.
attr_middle rules out any keyword or keyword sequence that could come between two expressions, such as and, or, if, of, else, in dir[ection], has a[n], is, is not.
attr_end further rules out any keyword or sequence that could come at the end of an expression, such as as is string in mV, magnitude in seconds, turned left, per hour, ms, C, exists, does not exist, up.

The right way to do this is probably to start with attr_end as ID and any legal keywords for it and then add in the ones that are allowed for attr_middle and then attr_start.

What I can't figure out how to do, though, is to handle the multi-word phrases. For example, if I see p's drop does not exist, I want to be sure to parse the attribute as drop, not as drop does not, even though the latter would be a legal attribute name. This would seem to require a negative lookahead, which ANTLR doesn't appear to support.

I think I can do some of the negative lookahead by using LA(n) on the input stream, which should be something like

  | 'does' {self.input.LA(1) != NOT or self.input.LA(2) != EXIST}?
  | 'in' {self.input.LA(1) != DIR and self.input.LA(1) != DIRECTION}?

That should handle at least the ones containing literals. For the others (e.g., magnitude in mV), I'll need to have a function. What I probably want to do is introduce a general negative_lookahead() function that takes optional sequences of tokens and collections of tokens and does the logic based on LA.

HPInc / HP-Digital-Microfluidics

Add DML support for multi-word attributes/names and properties #273

Migrated from internal repository. Originally created by @EvanKirshenbaum on Jun 05, 2023 at 2:22 PM PDT.

Migrated from internal repository. Originally created by @EvanKirshenbaum on Jun 06, 2023 at 2:15 PM PDT.