boostorg / histogram

Fast multi-dimensional generalized histogram with convenient interface for C++14
Boost Software License 1.0
315 stars 73 forks source link

Add pick command for reduce #275

Open HDembinski opened 4 years ago

HDembinski commented 4 years ago

Copied from #267:

scikit-hep/boost-histogram#296 suggests that one should be able to pick individual bin indices from a category axis. The suggested command is pick(index1, index2, ...), as suggested by @henryiii

The idea is good, but the implementation cost is not negligible. pick would only work for unordered axes like category. A new kind of reduce constructor must be defined and a corresponding reduce implementation must be written.

henryiii commented 4 years ago

This would also be (very) useful for boolean, and integer, too, though it can be imitated fairly easily currently. I would expect it to work on all axes, for consistency, and sometimes that could actually be useful as well. For example, if you use a Regular axes to collect a range of dates, and you want to select one bin's worth of dates and look at the remaining remaining axes.

HDembinski commented 4 years ago

It is definitely useful. This issue was stuck for a while, because it required a rewrite of the algorithm that indentifies and copies the corresponding bins from the old histogram to the new histogram. I had to write this new algorithm to make histograms with category axes addable, so this feature is less far away from reality as before.

henryiii commented 3 years ago

There are actually two distinct needs here. Using straw man terms for illustration, not intended to be the recommended names, just pointing out the two distinct uses:

  1. pick_one(axis_number, index) - this would remove an axis by selecting one item, dropping the rest. In UHI terms, this would be used when you write something like h[:, 3, :] -> bhalg::reduce(h, bhalg::pick_one(1, 3)). You could reduce over multiple of these to "pick" multiple axes. This would work on all axes types, and would remove an axis. Due to the changing return type, maybe these would need to be applied to a new bhalg::pick_reduce function, not the normal reduce? In return type, it's actually similar to project.
  2. pick_several(axis_number, indexes) - this would pick several items from an axes. UHI terms: h[:, ["one", "two"], :] -> bhalg::reduce(h, bhalg::pick_several(1, {"one", "two"})). The return histogram axes would be the same as the input axes, just with smaller category axes (and eventually, integer axes could become category, etc).
HDembinski commented 3 years ago

It should be just one pick command, but the equivalent of pick_one can be work with any axis, while pick_several can currently only work for the category axis.

The limitation of the current implementation of reduce is that the axis types cannot change. A future rewrite of reduce should support converting axis types, to support pick on an integer axis, which would then produce a category axis.

It has low priority, but we could also add a discontinuous axis (straw man name), which would allow for gaps between bins. With this one could also pick from the regular or variable axis.