codeminders / hamake

Hadoop dataflow-based task manager
4 stars 1 forks source link

Pros and Cons of current syntax #40

Open GoogleCodeExporter opened 9 years ago

GoogleCodeExporter commented 9 years ago
I would like to suggest to collect here advantages and disadvantages of current 
syntax of Hamake file. This information could be used for farther syntax 
refinement or, maybe, during its complete redesign

Disadvantages: 
            -lack of loops
            -lack of expressions

Original issue reported on code.google.com by v...@codeminders.com on 29 Sep 2010 at 1:50

GoogleCodeExporter commented 9 years ago
We trying to keep the syntacs declarative, shunning imperative constructs like 
Loops.  Expressions could be OK, but instead of building a new complex language 
here we can probably provide some integration points to allow to embed some 
embedded language constructs.

Original comment by kroko...@gmail.com on 20 Jul 2011 at 5:51

GoogleCodeExporter commented 9 years ago

Original comment by kroko...@gmail.com on 20 Jul 2011 at 5:58

GoogleCodeExporter commented 9 years ago
Here are some links to expression languages (and engines), that we might 
consider:
 * http://mvel.codehaus.org/
 * http://www.fourmilab.ch/diesel/
 * http://download.oracle.com/docs/cd/E17802_01/j2ee/j2ee/1.4/docs/tutorial-update2/doc/JSPIntro7.html
 * http://commons.apache.org/jexl/
 * http://commons.apache.org/el/index.html

Also I would like to note, that in case all we need is to be able to specify 
smth. like this
{{{
<jobconf name="timeout.milliseconds" value="3 * 60 * 1000"/> 
}}}
or
{{{
<literal value="${somePath}/${foreach:basename}.${foreach:ext} + .txt"/>
}}}
we could use a template engine, e.g. Freemerket - http://www.freemarker.org/

Original comment by v...@codeminders.com on 1 Aug 2011 at 7:00

GoogleCodeExporter commented 9 years ago
I was thinking of embedding Jython such that one could embed expressions and 
even function calls within variable references...

Original comment by petenewc...@gmail.com on 3 Aug 2011 at 1:21

GoogleCodeExporter commented 9 years ago
let me review and then we can discuss it

Original comment by kroko...@gmail.com on 3 Aug 2011 at 4:36

GoogleCodeExporter commented 9 years ago
Peter,

To use Jython for evaluation of expressions within Hamake-file is a great idea! 
It gave me the following thought: why don't we use JSR-223 
(http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/), that 
is available in JDK 1.6 for embedding scripts written on Jython or some other 
scripting language inside Hamake? Particularly we could use 'eval()' function 
that is available in almost all modern scripting languages to evaluate 
expressions.
Actually I prefer that we use ECMAScript instead of Jython, because it allows 
to evaluate expressions faster than Jython (a simple run of a loop, where I 
evaluated expression 'a+b' , on my machine showed, that Jython works 5 times 
slower than JS), and disallows to access System IO facilities. I've attached to 
the ticket an example of JSR-223 usage (for both, Jython and JS).
As for using Jython inside Hamake, we could add one more kind of task, that 
will allow to launch programs, written in Jython.

Regards,
Vladimir

Original comment by v...@codeminders.com on 4 Aug 2011 at 12:09

Attachments:

GoogleCodeExporter commented 9 years ago
Yup, makes sense to me!  I've used Rhino (the default ECMAScript engine) in the 
past with great success too.  And, as you point out, with JSR-223, people can 
choose the engine they want to use.

So, as I see it there would be these three main tasks:

1. Add syntax that allows you to specify what expression language (scripting 
engine) you're going to use in your hamakefile.  We should probably default it 
to ECMAScript (JavaScript) if for no other reason than that it is the only 
engine included by default with the Oracle JRE/JDK.

2. Add syntax that allows you to define functions to extend the expression 
language.  This is important in order to enable the definition of 
transformations that cannot be specified as a single expression.  IMO, these 
should be able to be defined inline in the hamakefile itself or referred to via 
URL, either relative to the hamakefile or absolute.  It would probably wise to 
have both global and local-to-DTR versions of this syntax, where functions 
defined by the local version are effective only within the scope of the 
containing DTR.

3. Extend the variable substitution code somehow to allow use of such 
expressions.  I see there being two main paths here: one would only allow use 
of named functions, both predefined (effectively like the current ${foreach:*} 
variables) and user defined per #2 above.  The other would allow expression 
code to be embedded directly.  These options are not mutually exclusive-- we 
could extend the ${} syntax to accomplish the former while at the same time 
introducing, for example, a $() syntax to accomplish the latter.  However, the 
latter will require a more robust variable substitution parser than currently 
exists, since regardless of the delimiters chosen it will probably need to 
handle some kind of recursion in order to accommodate expressions that 
themselves contain delimiters.

Hmm... in light of JSR-223, it might be wise to avoid embedding arbitrary 
expressions in variable references since the parsing code could not have 
perfect prior knowledge of the embedded syntax!

In either case, I recommend that as you have already begun to do with your a+b 
test, we pass values into the functions by publishing available data into the 
scripting environment as described at 
<http://java.sun.com/developer/technicalArticles/J2SE/Desktop/scripting/#7>.  
This avoids any quoting or interpretation issues that would arise with 
recursive variable substitution into expression code.

Make sense?

-peter

Original comment by petenewc...@gmail.com on 4 Aug 2011 at 1:13

GoogleCodeExporter commented 9 years ago
Ah, I forgot your other suggestion...  Yes, #4 would be to add a script task 
type.  While I think that this might too be valuable, I would consider it a 
separate feature.  As such, while it could certainly inherit the language and 
functions defined globally for the hamakefile, I think it should optionally 
declare its own language, etc.

Actually, that brings to mind another advantage to going the 
variables-must-only-reference-functions route-- the function names could 
potentially be defined at the XML level and each could even be defined in a 
separate scripting language.  If we do this, however, we'll probably need 
separate syntax elements for defining these "dynamic variables" and for 
initializing scripting environments (declaring imports, defining utility 
functions, etc.).

-peter

Original comment by petenewc...@gmail.com on 4 Aug 2011 at 1:28

GoogleCodeExporter commented 9 years ago

Peter, I completely agree with your #1, 2 and 3. As for #4 - I've created a new 
ticket, where I've tried to outline the new feature, that will allow to define 
utility functions

Let me propose here an extention of current syntax, that might be used to 
embedding of an expressions:

An expression will be defined between "${=" and "}" symbols. By default an 
expression will be evaluated by JS engine, but one will be able to use other 
engine by specifying its letter-code before '=' symbol, e.g. ${jy=}. 
Letter-codes will be hard-coded. For the first time, we will support only two 
engines: jy - Jython and js - Java Script, but later on, we could extends this 
list and add more engines 
(http://java.net/projects/scripting/sources/svn/show/trunk/engines?rev=236).

Inside expression, one will be able to use globally-defined parameters and 
functions (either language-specific or defined for Hamake file. Please see 
issue #51 for detailes on syntax proposed for definition of functions in 
Hamake-file). Hamake will run scripts in isolated environment, and will pass 
values into expresions by publishing them.

Vladimir

Original comment by v...@codeminders.com on 5 Aug 2011 at 9:46