Open stuartlynn opened 9 years ago
I see some problems:
Imagine we have a histogram like this for a tile:
{
bins: [0, 1, 2, 3, 4, 5],
values: [3, 4, 4, 4, 1]
range: [0.0, 10.0]
}
(so 5 buckets, range 0 -> 10)
and this one for another tile:
{
bins: [0, 1, 2, 3, 4, 5],
values: [3, 4, 4, 4, 1]
range: [0.0, 2.0]
}
As you see the range is 0.0 -> 2.0
(everything else is the same, it does not mind for this example)
What would be the resulting histogram:
{
bins: [0, 1, 2, 3, 4, 5],
values: [3 + 16, 4, 4, 4, 1]
range: [0.0, 10.0]
}
We can calculate it for the first bucket because the bucket size in the first histogram is 2.0 (10.0/5) and the second histogram range is inside one bucket. It would work if the second histogram range would be a a multiple of the bin size of the first histogram.
In general everything will work if the bin size for each histogram is:
This is harder server side than client side, in client side you just need to do:
I 100% agree. I didn't get round to implementing it in the code but the plan I had was to pass through the max and min of the column to width_bucket(requests,0,min,max). That would keep the bin sizes consistent. I like your idea of having each bin be an integer multiple of a given amount for the aggregation.
The bins / range format looks fine as well. Lets go with that rather than the float boundaries for the bins.
I worked on this, this is my proposal:
{
"bins": { "0": 10, "9": 1, ....},
"bounds": [1.1, 3.2],
"zoom": 10,
"x": 4
}
It sounds weird but let me explain a little bit each field:
The histograms are built in the same way tiles are built, we have the concept of zoom. The zoom 0 has 64 bins, zoom 1 -> 128, zoom 2 -> 256
zoom says what is the zoom level for the bins.
function transform_histogram_to(hist, zoom) {
var new_hist = {
bins: {}
bounds: hist.bounds
zoom: zoom,
x: 0
}
var zoom_diff = zoom - hist.zoom;
new_hist.x = hist.x >> zoom_diff;
// aggregate
for(var k in h.bins) {
new_hist[(h.x + k) >> zoom_diff - new_hist.x] += hist.bins[k];
}
return new_hist;
}
function merge_histogram(h0, h1) {
var new_hist = {
bins: {}
bounds: [min(h0.bounds[0], h1.bounds[0]), max(h0.bounds[0], h1.bounds[1])],
zoom: h0.zoom, // zoom should be the same
x: min(h0.x, h1.x)
}
var zoom_levels =
// check the range isn't too big
var bucket_range = max(h0.x, h1.x) - new_hist.x
if ( bucket_range > 64) {
// zoom out
zoom_levels = ceil(log(float(bucket_range)/64)/log(2));
}
new_hist.x = new_hist.x >> zoom_levels;
for(var k in h0.bins) {
var idx = (h0.x + k) >> zoom_levels
if (new_hist[idx]) {
new_hist[idx] = 0
}
new_hist[idx - new_hist.x] += h0.bins[k];
}
for(var k in h1.bins) {
var idx = (h1.x + k) >> zoom_levels
if (new_hist[idx]) {
new_hist[idx] = 0
}
new_hist[idx - new_hist.x] += h1.bins[k];
}
return new_hist;
}
Probably the code does not work and can be improved, it's only to illustrate the thing.
Another possibility, instead of sending the x
just have it sum in the bins, like {1020: 3, 1021: 3... }
This looks good. I think it will work in the front end just fine.
Currently I am using the SQL to do requests for Histogram tiles. The code can be seen here: It builds up a series of arrays of the histogram data and bins and then concatenates these together in the response. The SQL it generates looks like
Where cat_id is for a category request and requests is for a histogram of a numerical variable. This is an example of the response from the query:
https://gist.github.com/stuartlynn/76e41e9c07ab01d92ee2
It returns a row with two arrays for each variable, one for the bins and the other for the values. In a json response though we probably want to have these formatted as something like:
Where the mappings relate the category values to their labels.
@javisantana