aic-sri-international / aic-expresso

SRI International's AIC Symbolic Manipulation and Evaluation Library (for Java 1.8+)
BSD 3-Clause "New" or "Revised" License
8 stars 0 forks source link

Re-organize Expressions API into something easier to understand #32

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
Re-organize Expressions into something easier to understand, although I would 
like to write an interface/API first before going on with this.

Original issue reported on code.google.com by ctjoreilly@gmail.com on 4 Jan 2014 at 1:02

GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago
[deleted comment]
GoogleCodeExporter commented 9 years ago

Original comment by ctjoreilly@gmail.com on 31 Jan 2014 at 12:57

GoogleCodeExporter commented 9 years ago
Hi Ciaran,

Here are some notes regarding what the API should look like.

There are Expressions and SyntaxTrees.
An Expression has a SyntaxTree but (unlike the current situation) they are not 
the same thing). The expression "for all p(X) : p(X) and q(X)" has 
sub-expressions "X" and "p(X) and q(X)", but its syntax tree has the following 
structure:

"for all . : ."
   +-- p
       +-- X
   +-- and
       +-- p
       |   +-- X
       |
       +---q
           +-- X

To tell if something is a sub-expression of an expression, a good test is 
asking yourself if, had you been told what the value of that sub-expression is, 
it would be valid to replace it by its value.
In the example above, if I am told X = a, we can replace it so:

for all p(a) : p(a) and q(a)

If I am told that p(X) = b, we cannot replace it so:

for all b : b and q(X)

and this is why the first "p(X)" it is not a sub-expression. Note that the 
second "p(X)" is a sub-expression, and it makes sense to replace it by its 
value when we know that.

There are several types of expressions: symbols, function applications, 
intensional and extensional sets, universal and existential quantifications, 
lambda expressions, bracketed expressions, and so on. Right now, this is 
indicated not by extending classes (creating sub-classes), but by the 
Expression method getSyntacticForm. Let's leave it for now, but at a second 
stage of cleaning up we will want to make these types into extending classes.

So the Expression API will basically remain the same, with replacing methods 
and sub-expression access methods, minus the sub-syntax tree accessing methods, 
which will be part of another SyntaxTree interface. It will also, for now, keep 
the getSyntacticForm method.

Please let me know of any questions.

Thanks,

Rodrigo

Original comment by rodrigob...@gmail.com on 31 Jan 2014 at 8:07

GoogleCodeExporter commented 9 years ago
Hi Rodrigo,

Attached is an initial proposed version of the new API, see .png file (I used 
the free UMLet tool http://www.umlet.com/ to draw it and have included its .uxf 
file as well). Some points:

1. I have focused initially on just the Expression side of the tree and will 
worry about the syntax branch once that is agreed upon.

2. I have moved several of the current Expression API method into Expressions 
as utility routines, as these are really to do with getting values from unknown 
expression specializations.

3. 'Symbol' only exists as an Expression now and I'm proposing we use 'Token' 
to refer to an atomic value on the syntax side - the intent being to ensure the 
syntactic and semantic sides of the language tree are distinct from each other.

4. I am proposing we split Symbol up into 4 subclasses, whereby a Symbol ends 
up more closely corresponding to a Lisp Symbol going forward and the other 
types handle there types of expressions. Note, we can delay doing this till 
when we are ready to have concrete expression types extend CompoundExpression 
(e.g. Set). 

5. I added FunctionApplication as an extension to CompoundExpression just to 
show that methods that are simple wrappers around lower level methods (e.g. 
FunctionApplication.getArg would just call getChild) that more closely match 
the intended meaning of that particular expression type belong on the 
specialization API and not on the more general Expression API. This again can 
be delayed till we are ready to create concrete class implementations of 
different types of expressions.

6. We still need to clean up the current provider mechanism, i.e. the way 
semantics are currently registered with expressions, this currently causes 
several ugly hacks in the code base when we try to walk sub-expressions without 
a rewriting process (i.e. at parse time). I think we can likely clean this up 
when we implement concrete classes for each type of expression and have the 
list of required semantics to be supported available at parse time outside of 
the rewriting process. Alternatively, we difine a set of semantic interfaces 
and then individual concrete implementations of expressions implement those 
semantic interfaces that they support - this may be cleaner, while still 
providing a global view (via the Semantic Interfaces) of the different types of 
semantics that expressions can support.

Original comment by ctjoreilly@gmail.com on 31 Jan 2014 at 10:27

Attachments:

GoogleCodeExporter commented 9 years ago
Hi Ciaran,

Great, I like your proposal very much! Here are a few notes:

- please use LanguageTree (or perhaps even better, simply "Tree", since there 
seems not to be particularly linguistic aspects of it) instead of LangTree to 
stick with the no-abbreviations convention.

- If I understood correctly, FunctionApplication is now just standing as an 
example of the use of CompoundExpression. The way it will work, we will go with 
CompoundExpression for now, until we add types of expressions as its extensions 
(not only FunctionApplication, but IntensionalSet, LambdaExpression, etc). If 
that is the intention, makes sense to me.

- At this point, when we read an expression, we don't know whether the symbols 
are meant to be symbols like constants, or string literals. So having Symbol 
and StringExpression is complicated by that because we won't know, at reading 
time, which class to use. Therefore, we should just use Symbol for both 
variables and constants and the like, and string literals. But if we do that, 
we are mixing the notion of a symbol and its string value. They become the same 
thing. They are just a string. This extends even to numbers, since when we read 
a numeric symbol we don't know if it is meant to be really a number, or the 
string representation of one. The current organization follows this story, so I 
would choose keeping it as is.

Thanks again!

Original comment by rodrigob...@gmail.com on 5 Feb 2014 at 1:21

GoogleCodeExporter commented 9 years ago
Hi Rodrigo,

Switching ownership over to you.

Best

Ciaran

Original comment by ctjoreilly@gmail.com on 9 Apr 2014 at 10:36