Open jhellerstein opened 12 years ago
This could be done via a more powerful module/import system: you'd need to be able to introduce new imports at runtime, in a data-dependent manner.
For now, would it suffice to just make the argmax over an extra field in the tuple? I realize that breaks the encapsulation, but it essentially permits user-defined grouping (the user can either pass a fixed value for the field to get a single output group, or encode their partitioning scheme as distinct field values).
The workaround isn't much help in practice, as my module is dozens of lines long and has to pass the external grouping attribute through a whole lot of logic.
It probably shouldn't be part of the import system in a naive way, as that would essentially generate an import instance per group, with the group names being in the import namespace rather than in the data where they belong (for subsequent joining, shipping across the net, etc.)
We want some kind of lambda here where the outer grouping is parameterized by an inner aggregation function.
I think what you describe at the end is equivalent to what I was suggesting: you basically many independent instances of an operator (argmax in this case), where the number of instances/partitioning scheme depends on the data. import
does precisely the same thing, except that the number of instances/partitioning scheme is fixed in the program text. As far as fetching the partition name, you can imagine adding a builtin function to return the name/ID of the current module.
this sounds like fodder for a wholesale rethink of the module system, in which the namespace is reified in the data.
Assume I have a module Agger that takes a set of tuples as input and produces a single aggregate tuple as an output via some encapsulated logic.
Now, I have a set S that I'd like to partition on its first column, and evaluate Agger once per partition -- essentially use Agger as the aggregate in a group by. There's no way to do that right now.
Here's an example. It currently produces the tuple with the highest key. I'd like to modify it to group by val, and produce the highest-keyed tuple per value (one for :thing and one for :thang). I want to do this without breaking the encapsulation on Agger-- i.e. no fair changing the argmax clause.