Added a :partition-by option to distinct. It should be relatively easy to add partitioners to other operators that could take them, but I only needed distinct for now.
Unfortunately, Hadoop doesn't allow for parameters to be passed to partition functions, so the workaround was to generate many of them. Right now it makes 32 of them, but this can be easily changed if there is a need. This also introduces state into the script generation process (which partitioner to use), which is now explicitly passed through that command.
The generated partitioners all extend the same base class and determine which config to load based on their name, which includes a numerical index.
@daveray @pathaks @johnmidgley
Added a
:partition-by
option todistinct
. It should be relatively easy to add partitioners to other operators that could take them, but I only neededdistinct
for now.Unfortunately, Hadoop doesn't allow for parameters to be passed to partition functions, so the workaround was to generate many of them. Right now it makes 32 of them, but this can be easily changed if there is a need. This also introduces state into the script generation process (which partitioner to use), which is now explicitly passed through that command.
The generated partitioners all extend the same base class and determine which config to load based on their name, which includes a numerical index.