Jaql is a query language designed for Javascript Object Notation (JSON), a data format that has become popular because of its simplicity and modeling flexibility. Jaql is primarily used to analyze large-scale semi-structured data. Core features include user extensibility and parallelism. In addition to modeling semi-structured data, JSON simplifies extensibility. Hadoop's Map-Reduce is used for parallelism.
Some user defined functions have non-trivial initialization phases
that in the ideal case, would not be repeated for each record of a
mapper's input. Some work-arounds include the use of static variables
(need to worry about multiple threads) and the registry infrastructure in
jaql (as used for random number sampling). The first option may not be
safe and the second requires jaql code to be modified which is not a good
long option.
While there is merit in exposing the registration infrastructure, a better
option is to support initialization for user-defined functions.
Original issue reported on code.google.com by vuk.erce...@gmail.com on 27 Jan 2009 at 1:39
Original issue reported on code.google.com by
vuk.erce...@gmail.com
on 27 Jan 2009 at 1:39