Open EvanKirshenbaum opened 5 months ago
Let's focus on attributes to start. They should be easier to deal with because
's
, .
, has a(n)
, or doesn't have a(n)
.exists
, doesn't exist
, or =
, so we don't have to worry about them running up against another expression start.Ideally, we should be able to simply do something like
attr
: attr_start (attr_middle* attr_end)?
;
where for each class we specify the words that can be used.
attr_start
is pretty much any identifier except for maybe
(and we could even allow that and check for it in the attr_expr
rule. This includes ID
and all keywords, including types.attr_middle
rules out any keyword or keyword sequence that could come between two expressions, such as and
, or
, if
, of
, else
, in dir[ection]
, has a[n]
, is
, is not
.attr_end
further rules out any keyword or sequence that could come at the end of an expression, such as as is string in mV
, magnitude in seconds
, turned left
, per hour
, ms
, C
, exists
, does not exist
, up
.The right way to do this is probably to start with attr_end
as ID
and any legal keywords for it and then add in the ones that are allowed for attr_middle
and then attr_start
.
What I can't figure out how to do, though, is to handle the multi-word phrases. For example, if I see p's drop does not exist
, I want to be sure to parse the attribute as drop
, not as drop does not
, even though the latter would be a legal attribute name. This would seem to require a negative lookahead, which ANTLR doesn't appear to support.
I think I can do some of the negative lookahead by using LA(n)
on the input stream, which should be something like
| 'does' {self.input.LA(1) != NOT or self.input.LA(2) != EXIST}?
| 'in' {self.input.LA(1) != DIR and self.input.LA(1) != DIRECTION}?
That should handle at least the ones containing literals. For the others (e.g., magnitude in mV
), I'll need to have a function. What I probably want to do is introduce a general negative_lookahead()
function that takes optional sequences of tokens and collections of tokens and does the logic based on LA
.
At the moment, DML supports a fixed set of multi-word names and attributes, and it separates expressions from attributes by using
.
or's
. It would be nice if it was possible for users to define their own multi-word names, e.g.,The problem with this is that there are a bunch of
expr
rules that follow anexpr
by a keyword that could be in aname
:| quant=expr 'as' 'a'? 'string' 'in' dim_unit # unit_string_expr
| val=expr existence # existence_expr
| start_dir=expr 'turned' turn # turn_expr
| dist=expr 'in' ('dir' | 'direction') d=expr # in_dir_expr
| amount=expr dim_unit # unit_expr
| amount=expr 'per' dim_unit # unit_recip_expr
| amount=expr 'C' # temperature_expr
| vol=expr 'of' which=expr # liquid_expr
| obj=expr possession ('a' | 'an') attr # has_expr
| obj=expr ('is' NOT? | ISNT) pred=expr # is_expr
| lhs=expr 'and' rhs=expr # and_expr
| lhs=expr 'or' rhs=expr # or_expr
| first=expr 'if' cond=expr 'else' second=expr # cond_expr
I'd also like to be able to have a basic
expr property
rule that would allow you to say things likewith general notions of inferring negative forms. My basic idea is that properties like that are necessarily boolean (and don't take parameters, although I may want to rethink that), which means that they can't be strung together. Also, they would necessarily start with a positive auxiliary (e.g.,
is
,can
,has
,does
,will
,must
,should
,may
) or a third-singular verb. I think I'm willing to start by saying that there are a fixed number of these, e.g.,writes
,reads
,sends
,counts
,needs
, and they all negate withdoesn't
.To do this in general, I'll probably have to have some sort of secondary grammar rules for some or all of the above rules to pull long sequences of words apart, but even that might not work. For example, something like
n uL of lookup(k)
will wind up parsing asn uL of lookup
,(
,k
,)
, which won't have the right structure.If I'm willing to simply go for multi-word names and boolean properties (as constants, not computed), and I'm also willing to rule out keywords and keyword sequences that would make things ambiguous, I think it may be doable.
Migrated from internal repository. Originally created by @EvanKirshenbaum on Jun 05, 2023 at 2:22 PM PDT.