br1ghtyang / asterixdb

Automatically exported from code.google.com/p/asterixdb
0 stars 0 forks source link

Add temporal-bin function #546

Closed GoogleCodeExporter closed 8 years ago

GoogleCodeExporter commented 8 years ago
Need to add a temporal-bin function to support binning temporal values. The 
behavior of the function should be like:

get-temporal-bin(temporal_val, start_time, bin_size)

Where:

temporal_val: the temporal value to be binned, in the type of time/date/datetime

start_time: the start time for the first bin, with the same type as temporal_val

bin_size: the size of each bin, in the type of duration (specifically 
year-month-duration or day-time-duration as the division can be easily computed 
without ambiguity).

Returns:

A long integer as the bin index.

Original issue reported on code.google.com by jarod...@gmail.com on 28 Jun 2013 at 6:16

GoogleCodeExporter commented 8 years ago
This function will need a "committee level" language/design review before it 
should be considered finished - want to be sure we get its logical model right!

Original comment by dtab...@gmail.com on 1 Jul 2013 at 6:57

GoogleCodeExporter commented 8 years ago
OK... I have committed a fix for this. Probably we can discuss over this 
version. Now the signature is like this:

interval-bin(timeToBin, timeToStart, durationBinSize)

and it returns an interval representing the bin containing the timeToBin. 

The semantic of this function is to return the bin containing the time point 
"timeToBin", if the bins are created from the "timeToStart" with the bin size 
to be "durationBinSize".  Here is an test case:

drop dataverse test if exists;
create dataverse test;
use dataverse test;

create type Schema as closed{
id: int32,
timestamp: datetime
}

create dataset tsdata(Schema)
primary key id;

insert into dataset tsdata({"id": 1, "timestamp": 
datetime("-1987-11-19T23:49:23.938")}) 
insert into dataset tsdata({"id": 2, "timestamp": 
datetime("-1987-11-20T00:27:13.432")}) 
insert into dataset tsdata({"id": 3, "timestamp": 
datetime("-1987-11-18T18:00:00")}) 
insert into dataset tsdata({"id": 4, "timestamp": 
datetime("19871119T234923938")}) 
insert into dataset tsdata({"id": 5, "timestamp": 
datetime("1987-11-19T23:58:17.038")}) 
insert into dataset tsdata({"id": 6, "timestamp": 
datetime("1987-11-19T23:30:00")}) 
insert into dataset tsdata({"id": 7, "timestamp": 
datetime("1987-11-19T23:22:38")}) 
insert into dataset tsdata({"id": 8, "timestamp": 
datetime("1988-01-21T17:28:13.900")}) 
insert into dataset tsdata({"id": 9, "timestamp": 
datetime("-1987-11-19T23:49:23.938")}) 
insert into dataset tsdata({"id": 10, "timestamp": 
datetime("-0987-07-01T09:35:28.039")}) 
insert into dataset tsdata({"id": 11, "timestamp": 
datetime("2012-11-19T23:49:23.938")}) 
insert into dataset tsdata({"id": 12, "timestamp": 
datetime("2013-11-19T23:49:23.938")}) 

for $i in dataset tsdata
group by $d := interval-bin($i.timestamp, datetime("1990-01-01T00:00:00.000Z"), 
year-month-duration("P20Y")) with $i 
order by get-interval-start($d)
return { "tbin": $d, "count": count($i)}

And the result should be:

{ "tbin": interval-datetime("-1990-01-01T00:00:00.000Z, 
-1970-01-01T00:00:00.000Z"), "count": 4i64 }
{ "tbin": interval-datetime("-0990-01-01T00:00:00.000Z, 
-0970-01-01T00:00:00.000Z"), "count": 1i64 }
{ "tbin": interval-datetime("1970-01-01T00:00:00.000Z, 
1990-01-01T00:00:00.000Z"), "count": 5i64 }
{ "tbin": interval-datetime("2010-01-01T00:00:00.000Z, 
2030-01-01T00:00:00.000Z"), "count": 2i64 }

Original comment by jarod...@gmail.com on 1 Jul 2013 at 7:04

GoogleCodeExporter commented 8 years ago
Added "management committee" into the discussion list. Please check out the 
latest comments from https://code.google.com/p/asterixdb/issues/detail?id=546#c2

Original comment by jarod...@gmail.com on 1 Jul 2013 at 7:32

GoogleCodeExporter commented 8 years ago
Adding Sattam since we should be consistent across binning functions, and we 
either have or will want something similar for spatial binning.  (We should get 
the designs for all of these figured out together, perhaps.  Just a quick 
exercise...)

Original comment by dtab...@gmail.com on 1 Jul 2013 at 8:00

GoogleCodeExporter commented 8 years ago
In which places do we use the binning functions?

Original comment by westm...@gmail.com on 1 Jul 2013 at 9:41

GoogleCodeExporter commented 8 years ago
The use case for temporal binning is wanting to clump time-located data records 
together into a set of regular time intervals.
The use case for spatial binning is wanting to clump spatially-located data 
records together into a set of rectangular spatial bins.
Both are "customer driven" requirements (but with obvious general usefulness).

Original comment by dtab...@gmail.com on 1 Jul 2013 at 9:50

GoogleCodeExporter commented 8 years ago
(Both would be used in group-by queries, to form the groups.)

Original comment by dtab...@gmail.com on 1 Jul 2013 at 9:51

GoogleCodeExporter commented 8 years ago
Reviewed by Sattam and merged with master.

Original comment by jarod...@gmail.com on 2 Aug 2013 at 3:02