This PR is to expand on the first iteration of the RTableExecutor for Texera's implementation of R UDF and R Source UDF by also implementing a rudimentary version of a Tuple API using R for usage in Texera. For this early implementation of Tuple API, the "coro" library from R is extensively used, make sure that the installation to R has this library installed.
This should be used when the user wishes to write R code to provide source data to any pipeline that use R UDFs.
The user does not need any input. The output in the R Source UDF must be a generator factory that defines a function that yields 0, 1 or more than 1 R list.
To use, the coro library must be loaded first via library(coro) and users must end their R code with coro::generator(function() { yield (tuple) }) to return to the engine
function() must be an R function that is anonymous (similar to a lambda function in Python)
the yield function in the function must be used
To use the R UDF (Tuple API):
This should be used when the user wishes to receive some input data, read and modify it and then return either different data or the same input data.
The user should expect an input of both a Tuple (an R list) and a port
Currently the port argument is unused, although this may be used in the future
The output in the R UDF must be a generator factory that defines a function that yields 0, 1 or more than 1 R list.
To use, the coro library must be loaded first via library(coro) and users must end their R code with coro::generator(function(tuple, port) { yield (tuple) }) to return to the engine
function(tuple, port) must be an R function that is anonymous (similar to a lambda function in Python) and the arguments (tuple, port) must be included, even if not used
the yield function in the function must be used
Limitations of Tuple API
We require users to manually include the coro library (via library(coro)) and end the UDF with coro::generator(function() {...}) in order to use Tuple API. This is not preferred and is not an elegant solution.
Any code written AFTER coro::generator(...) will result in an error, unless the code is another generator object that has a function which yields an R list. Comments are fine, however.
In addition, the function defined in the coro::generator(function() { ... }) block must be an anonymous function (a lambda function). This means you cannot define a function first and then pass it into the coro::generator(function() { ... }) block.
This PR is to expand on the first iteration of the RTableExecutor for Texera's implementation of R UDF and R Source UDF by also implementing a rudimentary version of a Tuple API using R for usage in Texera. For this early implementation of Tuple API, the "coro" library from R is extensively used, make sure that the installation to R has this library installed.
Software versions required/supported:
Python - 3.9.18
rpy2
(Python pacakge) - 3.5.11rpy2-arrow
(Python package) - 0.0.8R
- 4.3.3reticulate
(R package) - 1.36.1arrow
(R package) - 14.0.0.1coro
(R package) - 1.0.4Changes
Use cases/user requirements:
core/amber/src/main/resources/udf.conf
library(coro)
and users must end their R code withcoro::generator(function() { yield (tuple) })
to return to the enginefunction()
must be an R function that is anonymous (similar to a lambda function in Python)yield
function in the function must be usedcoro
library must be loaded first vialibrary(coro)
and users must end their R code withcoro::generator(function(tuple, port) { yield (tuple) })
to return to the enginefunction(tuple, port)
must be an R function that is anonymous (similar to a lambda function in Python) and the arguments(tuple, port)
must be included, even if not usedyield
function in the function must be usedLimitations of Tuple API
coro
library (vialibrary(coro)
) and end the UDF withcoro::generator(function() {...})
in order to use Tuple API. This is not preferred and is not an elegant solution.coro::generator(...)
will result in an error, unless the code is another generator object that has a function which yields an R list. Comments are fine, however.coro::generator(function() { ... })
block must be an anonymous function (a lambda function). This means you cannot define a function first and then pass it into thecoro::generator(function() { ... })
block.Showcase: