Closed nordfjord closed 9 years ago
This looks like it could potentially cover the Fake Groups (non-)pattern in a more graceful way. It would be great to find a home for that code, whether it's in reductio or in some postprocess library.
(And yep, caps really shouldn't be part of dc.js; I think if there were a hook to deal with the Others special case better, there would be no need for that mixin.)
Thanks for the PRs. I think adding the ability to run post-processing functions to Reductio would be great, and having cap and downsampling procedures available makes sense. I just think we need to think through where to put this in the API and the semantics of it.
Right now I have some basic thoughts on a 1.0 API for Reductio here: https://github.com/esjewett/reductio/blob/master/NOTES.md
For post-processing stuff, I think it would make the most sense to have these as methods on the returned group. So groups that go through Reductio would have the usual group.all
method, but they would also have a group.post
object providing access to functions like group.post.cap
and group.post.downsample
.
The reason I'd prefer to see these on the group rather than the reducer is that they are evaluated when called, not when the group is defined. That is, they are query-time, not load/filter-time. The semantics are therefore quite different than the reducers. For example, if you do ...
var group = dimension.group();
var reducer = reductio().sum(function(d) { return d.a; })(group);
reducer.sum(function(d) { return d.b; });
var result = group.all();
... then result contains a sum over a
, not a sum over b
. In other words, once a reducer is applied to a group, changing the reducer does not change the group. Further, a reducer can be applied to more than one group and the groups won't interfere with each other. However, the post-processing semantics are different, so I'd prefer not to have them on the reducer object.
Does that make sense? Would you be willing to rewrite the pull requests to put the post-processing logic onto the group object as suggested?
Thanks so much.
Alright, point taken, I suggest we make it like so then:
var reducer = reductio()
.sum('foo')(group);
group.post.cap(3); // Array of length 3
This allows us too change the variables on the post processing methods at runtime while still allowing us to create multiple groups from a single reducer.
Or we could even make that chainable:
var reducer = reductio().sum('foo')(group);
group.post.sortBy('sum').cap(3)();
Alright I still haven't made it chainable, but man that would be killer.
Anyways as it stands now the API for it is:
var reducer = reductio().sum('foo')(group);
group.post.cap(3);
group.post.cap(3, 'My Others Name');
Right so, I added a method to reductio (global)
reductio.registerPostProcessor('before', function(prior){
return function(before){
return prior().filter(function(d){return +d.key < +before;});
}
});
// reductio post now has a before method
var reducer = reductio().sum('foo')(group);
group.post().before(new Date())(); // returns all elements that are before now
Then there's the cap postprocessor, which is registered through that method and can be used as so:
var reducer = reductio().sum('foo')(group);
group.post().cap(3)(); // returns the top 3 elements of the grouping where the third is an aggregation of the rest
I added a sortBy postprocessor as well.
So with this PR you can do:
var reducer = reductio().sum('foo').count(true).avg(true)(group);
var sort_descending_and_cap = group.post().sortBy('-value.sum').cap(4);
var sort_ascending_and_cap = group.post().sortBy('value.sum').cap(4);
console.log(sort_descending_and_cap());
console.log(sort_ascending_and_cap());
The logs will be different since the postprocessing is applied seperately and simply return functions which work on the crossfilter group.all method.
Starting to look good. I think I might want to remove the functionality in sortBy where it has a lot of intelligence built into the string it is passed and just use an accessor function plus the standard Reductio "if I see a string it is a property name" approach. I can make that change after merging the pull request.
Sorry that this might take a little while. I'm traveling and/or on vacation all week and much of next, but will try to get this merged and a new release cut eventually.
I should also say, I like the registration approach. I'm now thinking if I can do something similar for all the reducers. Would make it quite a bit more encapsulated.
I added the extra intelligence on the sortBy processor since most of the time you're accessing values 'two levels' deep. So in order to allow a string to be passed it required the ability to handle strings like 'value.sum'
.
It's no problem if you want it removed, just wanted to give you my perspective on why it's there.
@nordfjord My thinking is that it should work like group.order. The accessor function is passed the contents of the group value
. If it's a string instead of a function (i.e. group.post().sortBy('sum')
), we interpret that as value.sum
. For things that are deeper in the value object, people would just have to use accessor functions.
If there is another library somewhere that handles converting these types of strings to accessor functions in a minimally confusing sort of way, I'd consider using it.
Right, That's where we differ, my thinking was that you sometimes sort by key value (e.g. alphabetical table) and sometimes by value (e.g. top-N table).
Though I tend to agree with you that in it's current state the pluck_n function is confusing, and that alone is be reason to abandon it.
Hmmm. That's true. Need to allow access to the whole group including the key to sort. Ok, let's back up.
I'll probably pull the pluck function in the sortBy into somewhere I can use it generally in Reductio for this string->accessor conversion? That is, also use it to replace the accessorify
function here: https://github.com/esjewett/reductio/blob/master/src/accessors.js#L11
But can we just dump the part where we use '-' to indicate ascending or descending sort? That's semantically correct for numbers, but not for strings. Probably makes more sense to have sortBy
take a second parameter that would accept an ordering like d3.ascending or d3.descending (or something else entirely), defaulting to ascending. So group.post().sortBy('value.sum', d3.ascending)
is equivalent to group.post().sortBy('value.sum')
and is reversed from group.post().sortBy('value.sum', d3.descending)
. What do you think?
I agree using the string to indicate sorting order is in general bad. Your proposal for an ordering function is cleaner.
In fact I think it's much cleaner from a programming perspective. I'll get on it when I'm off work.
That is sortBy would be of the form (accessor, order)
Where order is of the form (a,b)
and returns -1
, 1
or 0
The comparator function would then simply call this ordering function.
That should simplify things, and make the processor more powerful at the same time.
I'm however worried about adding a dependency on d3 just for the d3.ascending function unless you're planning on using it for more. Here is a simple ordering function for ascending:
function ascending(a, b){
return a < b ? -1 : a > b ? 1 : a >= b ? 0 : NaN;
}
which is probably similar to/the same as the d3 method.
Yes, just using d3 as an example. I'd say let's just use a generic ascending ordering as the default. I'd just like the ordering expected to be compatible with the d3 ordering interface because it's so widely used (and makes sense).
Overall, nicely done and thanks for the pull request :-) I still need to look at the tests, but I think we're probably OK to merge after that change. I might still make some changes after the merge, but let's go for it.
Oh by all means look at the tests, writing tests is one area I'd like to improve in.
I'll raid the d3 codebase for their ascending function and use that as default.
I'll get on it after work.
regards,
Einar
Cool - I just checked and d3's matches what you have above. Pretty standard, I'd imagine.
alright, so the '-' syntax is removed, and a d3.ascending 'like' function used as default comparator.
I don't have time for more at the moment, but let me know if you want me to assist in any regard.
Looks good to me. Now it's just a matter of me finding time to review it and merge. Thanks so much! I think this will be a really good addition to Reductio, and hopefully helpful to people using dc.js as well as other libraries.
@gordonwoodhull Do you know of other common "fake group" patterns that we might want to support here?
Once this is in I'll do a PR for most of the fake groups in the dc.js FAQ.
Ok merged. Sorry for the delay on this and thanks again for the contribution. I'll probably be in here making some minor changes and testing later this coming week and then I'll do a release including the post-processor stuff.
Sorry for bombarding you with PR's I've just been having fun tonight, poking at these problems.
Firstly, I thought reductio was missing a postprocessing feature, something that allows us to edit the data coming out of the group.all function (useful for removing all 'unknown' keys for instance)
So I added postprocess.js which is called when the group already has an all method so that we may override it.
Then you have the cap.js file which contains a function to modify group.all so that it returns an array of length
cap
and whose last element is an aggregation of all elements withindex >= cap
I wrote support forsum
avg
andcount
in the cap function.See the tests for usage.