Closed jgoizueta closed 6 years ago
The idea is to allow defining the granularity (level of detail) of columns used as dimensions of the aggregation, i.e. columns which we don't want to summarize (place inside cluster aggregation functions such as clusterAvg()).
For example, given a timestamp column with millisecond resolutions, we may want to obtain it as years, months, weeks, days, hours, etc. to use for animation.
In torque manage the granularity by specifying a number of buckets, but when using time values it is much more convenient to emply the common units of time.
We'll need to expand the aggregation API in the tiler to support this and define a way to use this feature in Carto VL.
I propose to implement expressions to define the granularity of a column, e.g. timeInHours($t)
, in a similar way to clusterAvg
and the other cluster aggregations:
$t
) or inside other granularity or cluster aggregation expressionsI'm not considering any alternative, such as variable/property declarations to define the granularity, because similar alternatives were discussed and dismissed for cluster aggregation function.
The use of a timeInHours($t)
expresion introduces a new column t_hours
defined as a SQL expression (which returns an epoch in hours) which would also be included in the GROUP BY of the aggregation.
We could have similar functions for numeric columns e.g. to classify by break points or number of buckets.
If the instantiation reports that data won't be aggregated and any of these expressions is used, it should fail (as for cluster aggr functions).
We should define a consistent terminology for these expressions which makes its intent as clear as possible.
Some options I can think of:
timeInX
(X = Years/Months/Weeks/DaysHhours/...); numberInBuckets
aggregatedX
; aggregedBuckets
grainX
; grainBuckets
inX
; inBuckets
asX
; asBuckets
The current dimensions
parameter of aggregation
would be expanded to accept a parameterized form of the discretization options, e.g.
{
aggregation: {
dimensions: {
t_in_hours: {
column: 't',
granularity: 'hours'
}
}
}
}
This would generate this kind of SQL wrapping:
SELECT date_part('epoch', date_trunc('hour', t))/3600 AS t_in_hours, ...
FROM ...
GROUP BY 1, ...
We should decide which format is more convenient for the resulting values:
The first format could be used directly by CARTO VL; the others would require some conversion on the client part.
A couple of ideas from @davidmanzanares:
animate($t, ...)
would detect that $t is a date, then use its minimum, maximum and the duration of the animation to pick some default unit of time and would apply the corresponding granularity expressionWe could have some auxiliary expressions like:
timeUnitsFor($t, desiredNumberOfBuckets=100)
to find a good unit of time for a property to approximate the desired number of bucketstimeDurationFor($t, desiredNumberOfBuckets=100)
to find the window duration to group a property into the desired number of bucketsYes, I think this is the way to go.
Regarding API I like clusterHours($date)
, clusterYears($date)
or maybe in singular (like in Julia: https://docs.julialang.org/en/stable/stdlib/Dates/index.html ).
Regarding the ability to define the number of buckets, maybe we could specify the minimum duration / granularity instead of the number of buckets, for example: clusterSeconds($date, 4)
meaning aggregate $date
in time intervals of 4 seconds. Floating point values could be used to: clusterSeconds($date, 0.1)
aggregate at tenths of a second.
Regarding the type of the returned values, I would stick to floating point values in seconds from the minimum value:
decodedDate = property.minimumDate + decodedValue
The reason for not using strings is space and the reason for not using integers is the ability to use aggs like clusterSeconds($date, 0.1)
. Also, usage of floating point allows us to use always seconds without problems since in the case we are dealing with huge values like decades the exponent is going to save us from overflow while keeping a good precision (relative to that magnitudes).
I like the simple names (not in singular, because to me it seems that hour(t)
extracts the hour of the time rather than giving the time in hours). But I thought that, just like in the clusterX
functions we should use a common prefix to make evident these are special functions with similar behaviour.
And I really like the hours($t, 4)
idea, and also agree with giving all values in seconds from the min.
After talking with @davidmanzanares : in some cases it could be interesting to be able to obtain the hour of day, day of week or month of the year (as opposed to the time in hours, days or months). So we could have expressions for that. A possibility would be to use singular (hour, day, month) for it, but would probably be too confusing.
After conversation with @rochoa & @davidmanzanares
The case of aggregations by the hours of the day, hours of the week, days of the week, months of the year and day of the year are very common and we should focus on providing a good solution for them.
In all these cases we have unit of granularity and another unit which acts as a cycle (we compute the time in the first units modulo the second units).
So, we'd like to support three different kinds of cases, and we could do it with expressions with three arguments (two of them optional):
clusterHours($t)
would return the data per hour as number of seconds since the minimum $t in increments of 3600.clusterHours($t, N)
would be like the previous one, but in increments of 14400.clusterHours($t, 1, 'day')
would return the data per hour of day (0-24), again in seconds (so 0 - 86400 in 3600 steps).The SQL expressions will get quite complex when the second argument N is not 1 and irregular units are involved (minutes, hours, days and weeks are regular in that they're always a fixed number of seconds, but months and years are not).
When the third argument (cycle) is present I think It can be more meaningful to return the time in the corresponding units (e.g. months 1 to 12 for clusterMonths($t, 1, 'year')
), and it can also simplify the expressions for irregular units.
After experimenting with a branch of the tiler which has an API for time dimensions, and return the values as ordinal values in the requested units, rather than epoch numbers (e.g. for days of the week it returns values 0 to 6; for days of the month 1 to 31, etc). I've detected some issues that we should discuss:
The current backend metadata includes only values (min, max, ...) for the un-aggregated query. Including data about aggregated columns is not straightforward because the values depend on the zoom level, but information about the classified dimensions is not a problem.
Now we map date columns (as epoch numbers) to values in the range [0, 1]. This is not valid for classified time dimensions if we used the units described above. I see to options to handle them:
What do you think @davidmanzanares ?
Should we reconsider using epoch numbers for all time classifications? That would simplify the mapping the values internally, but we would need to add complexity to provide tools to represent the values in a natural form (epochs are not great for days for the week).
Then, if we go on with the current implementation (ordinals), is the categories approach a valid option? Otherwise I'll add backend metadata options to request stats about aggregation dimensions.
Regarding the last comment, and after talking with @davidmanzanares it seems simpler to use the metadata we have now and derive the dimension limits from it (for cyclics, use the theoretical limits).
That requires us to replicate some time logic in the client, but I guess it's not too bad: https://github.com/CartoDB/carto-vl/blob/99a362941c4e58007b50d97300ea0b95c7e42f78/src/client/WindshaftWorker.js#L27-L75
Currently pending decisions:
After meeting with @rochoa & @davidmanzanares:
2018-03
for a month. We could introduce new types for this. This would be a special kind of category in the sense that they also have a numeric value, i.e. a specific distance between two values: for categories A
and B
there's no notion of the separation between them, but the distance from 2018-03
to 2018-05
is double than from 2018-02
to 2018-03
.Currently metadata has entries for the base columns (before aggregation), then if a base column is used in aggregations or dimensions those are listed in an array. I have added some support to Metadata to alleviate this, but it's still quite confusing, maybe should refactor Metadata (once again).
Dates (as epoch numbers) are mapped to the 0,1 interval in property textures to increase the numeric precision, as often the magnitudes are too large for the range of variation.
That seems unnecessary for all cyclic dimensions, and for non cyclic that's probably only relevant for the smallest units (seconds, minutes).
For seconds, 2018 values require 31 bits, 25 bits for minutes.
To map the values, the max, min stats are needed, which are available currently only for the base date column (and might not be precise for generic MVT sources).
We can compute the dimension stats from them (for cyclic only the theoretical range), but taking into account the TZ correctly would require a time library like moments-timezon.js or Luxon. This would increase the size of the packages, add complexity and dependencies.
But we might not need accurate stats just for reducing the magnitude of the values; the differences due to incorrect TZ (+/-12 hours at most) would be small when the magnitude is large, which is the case for which mapping is relevant. So, we could choose to use mapping based on the units, then use the (approx) computed min, mas from base stats ignoring the TZ.
Global aggregation expressions (GlobalMin, GlobalMax, etc.) are computed for properties using the metadata stats. Those expressions are useful as linear
defaults. For example animation
by default applies linear
to a property which in turn will take its min and max, and this makes animations for dates work nicely by default.
Dimension expressions, line cluster aggregation expressions, are not accepted by global aggregation functions, but they're really wrapping actual properties of the data, so it would make sense to accept them as properties. Then we'd need dimension metadata stats.
We have two options to support it: compute it in the client from the base columns stats, or add it to the backend.
Client stats:
To avoid dealing with TZs in the client we could use a single TZ (per layer), and have the backend provide base date columns (and its stats) as epochs in the TZ, so dim limits can be computed accurately. But this subverts the notion of UNIX epoch and introduces discontinuities in the values.
Server stats:
This is a cleaner way of not dealing with TZs in the client. Date values are UTC epochs, but all dim grouped values use the requested TZs.
Two alternatives exist currently that seem actively maintained
Example: how to return a (non-cyclic) month.
My initial implementation is a month counter, starting with 1 = 1970-01:
date_part('month', $t) + 12*(date_part('year', $t)-date_part('year', to_timestamp(0.0)))
(time zones are taking into account because $t
is substituted ultimately by timezone(${tz}, to_timestamp(date_part('epoch', the_time)))
There we could have an optional origin (month 1) definable by the user instead of the fixed one:
1 + date_part('month', $t) - date_part('month', $start) + 12*(date_part('year', $t) - date_part('year', $start))
Easy: to_char($t, 'YYYY-MM')
This can easily be converted to a sequential month number in the client with MM + (YYYY-y0)
.
And also to a UNIX epoch with
new Date(`${YYYY}-${MM}-01 00:00:00+00`).getTime();
Computing the end of the time range (i.e. the start of the next month) is also feasible:
new Date(`${nextY}-${nextM}-01 00:00:00+00`).getTime();
With
function nextMonth(y, m) {
m += 1;
if (m > 12) {
y += 1;
m = 1;
}
return [y, m];
}
This seems to work: (I've tried some corner cases involving February)
date_part('epoch', date_trunc('month', t)) AS start,
date_part('epoch', date_trunc('month', t + INTERVAL '1 month')) AS end
But, if we want be able to manipulate it in the client (e.g. to show it as YYYY-MM), the problem is that the values are UTC epochs, so day divisions don't necessarily correspond to the requested TZ.
This is also more complicated to handle, both in backend where a single dimension generates two columns and in the client, where we must handle both as part of a unified property.
Taking into account the previous comment, I now think that even if being able to access the start and end of a time period is very interesting, we can achieve that in the client without the complication of supporting the ranges in s the backend. So we could use one of the first to formats (months since X or YYYY-MM strings) and compute the ranges in the client.
I'm learning some JavaScript: here's a better way to compute the month start and end epoch given Y, M:
{ start: Date.UTC(Y, M-1, 1), end: Date.UTC(Y, M, 1) }
Note that it counts months from 0 and that if the month exceeds 11 the year is adjusted.
We could have auxiliar expressions to change the representation of (grouped) time, e.g. if we stick to the current ordinal numbers from the backend:
const N = clusterMonth($t) // => 579 (for $t = 2018-03-01T05:00:00+00)
MonthsSince(N, '2018-01') // => 2
MonthISO(N) // => '2018-03'
MonthStart(N) // => 1519862400
MonthEnd(N) // => 1522540800
MonthRange(N) // [1519862400, 1522540800]
MonthStartMs(N) // => 1519862400000
MonthEndMs(N) // => 1522540800000
... Or we could have directly different cluster expressions for each representation:
clusterMonthsSince($t, '2018-01') // 2
clusterMonthISO($t) // '2018-03'
clusterMonthRange($t) // [1519862400, 1522540800]
But then we'd have to support GlobalMin
etc for each of those, I guess
For the first case, we cold make expressions smarter, so that they get the units from the clusterX
expression:
since(clusterMonth($t), '2018-01') // => 2
iso(clusterMonth($t)) // => '2018-03'
range(clusterMonth($t)) // => [1519862400, 1522540800]
iso(clusterDay($t)) // => '2018-03-01'
// etc.
Update on the PR #991
It requires this tiler branch
Several families of expressions have been implemented:
These expression take numeric values: (but could be considered categories, as they have only a few number of distinct values: DayOfWeek (1:7), DayOfMonth (1:31), DayOfYear (1:366), HourOfDay (0:23), MinuteOfHour (0:59), QuarterOfYear (1:4).
Each one has an argument for the time property and an optional parameter for the timezone:
DayOfWeek($date, 'Europe/Madrid')
Time zone can be a fixed offset or a IANA name (handling daylight saving time changes).
GlobalMax
, GlobalMin
, linear
behave as expected with these expressions. Note that the limits are the actual limits in the dataset/query, not necessary the theoretical range.
These expressions can give the time in seconds, minutes, hours, days, months, years, weeks, quarters, trimesters, semesters, decades, centuries or millenia (phew!).
There are several forms for these expressions. The most complete gives serial numeric values, counting the specified periods of time since a starting time (by default it's 0001-01-01T00:00:00
s so that year numbers make sense).
For example: this counts periods of 5 months (in Madrid time) starting in January 2017:
clusterMonth($date, 'Europe/Madrid', 5, '2017-01')
In the result, the number 1 corresponds to the period 2017-01 to 2017-05, 2 to 2017-06 to 2017-10, etc.
GlobalMin
, GlobalMax
and linear
can also be used as expected.
Dates can also be handled as ISO format categories with these functions:
ClusterMonthIso($date, 'Europe/Madrid')
The results would be categories such as '2017-01'
.
These have only an optional timezone parameter, no starting or count is possible like for the numeric functions.
GlobalMin
and GlobalMax
are usable here.
Apart from the usual YYYY-MM-DDTHH:MM:SS format some special syntax has been introduced, see here
Finally, there are also expressions that give the start and ending times (internally as UNIX epochs, but interpreted in the specified timezone):
clusterMonthStart($date, 'Europe/Madrid');
clusterMonthEnd($date, 'Europe/Madrid');
Both the start and end expressions can be used with the same date property (for the other expressions here only one expression can be used with a any date property).
Like the ISO functions they support only the optional timezone parameter.
GlobalMin
, GlobalMax
and linear
can be used as expected.
These are useful to combine two animations to control precisely when features appear and dissapear, for example:
@monthStart: clusterMonthStart($date, 'Europe/Madrid')
@monthEnd: clusterMonthEnd($date, 'Europe/Madrid')
color: ramp(linear(@monthStart), SUNSET)
@animStart: animation(linear(@monthStart), 5, fade(0.1, 10))
@animEnd: animation(linear(@monthEnd), 5, fade(10, 0.1))
filter: @animStart and @animEnd
As discussed with @rochoa & @davidmanzanares the complete form of the serial expressions (as numerical values, e.g. clusterMonth
might be too complex to explain use and we could leave them out and keep only the others.
To avoid so many different expressions, we could, instead of having clusterSecond($date)
, clusterMinute($date)
, etc. have a single expression of each kind with an argument for the time units: clusterTime($date, 'second')
, clusterTime($date, 'minute')
.
clusterTime($date, 'minute')
sounds :checkered_flag: to me
Regarding start
/end
I would try to make it work seamlessly with https://github.com/CartoDB/carto-vl/issues/1012
Regarding ClusterMonthIso
, is ISO
a common name for this?
Would it be possible to use clusterTime
for everything? does that make sense? I still need to think, but it would be awesome to have consistency between every date related expression
Since we're dropping the numeric variants like ClusterMonth
, we could use this naming:
clusterTime('month', $date, timezone)
=> '2018-01' for text results (formerly Iso functions)clusterTimeStart('month', $date, timezone)
and ClusterTimeEnd('month', $date, timezone)
=> Date() for the start end datesThen for the cyclic functions, which return only numbers, we could maintain multiple expressions or use a single expression: (and we could reuse the same)
clusterTime('dayOfWeek', $date, timezone)
Or we could use clusterTime
for everything, and then have additiona expressions for the start and end: timeStart(clusterTime('month', $date))
, timeEnd(clusterTime('month', $date))
Now it is possible to use simultaneously on the same base column the start, end and the iso expressions. The first two are useful for animations, the other could be used to get simpler values through interactivity or viewportFeatures, or for styling.
For example:
@mstart: clusterMonthStart($date, 'Europe/Madrid')
@mend: clusterMonthEnd($date, 'Europe/Madrid')
@month: clusterMonthIso($date, 'Europe/Madrid')
@selectedMonths: buckets(@month, ['2018-04', '2018-06', '2018-07', '2018-09'])
color: ramp(linear(@mstart), SUNSET)
width: ramp(@months, [50, 40, 30, 10, 20])
filter: animation(linear(@mstart), 5, fade(0.1,10)) and animation(linear(@mend), 5, fade(10,0.1))
Implementation note: a single property is transferred from the backend (the iso category); then it is decoded as three properties into the dataframe. Only the used variants are decoded.
Update: the numeric expressions have been removed and the rest have been unified into:
clusterTime(property, timeUnits, timezone)
for example:
clusterTime($date, 'month')
// UTC by default // this returns a category ('2018-01'...)clusterTime($date, 'month', 'Europe/Madrid')
clusterTime($date, 'dayOfWeek')
// for cyclic units the result is a number, e.g. 1-7 hereclusterTimeStart(property, timeUnits, timezone)
returns a Date and does not accept cyclic units
clusterTimeStart($date, 'month')
// UTC by defaultclusterTimeStart($date, 'month', 'Europe/Madrid')
clusterTimeEnd(property, timeUnits, timezone)
returns a Date and does not accept cyclic units
clusterTimeEnd($date, 'month')
// UTC by defaultclusterTimeEnd($date, 'month', 'Europe/Madrid')
So we have
clusterTime($date, 'dayOfMonth')
-> 1 to 31clusterTime($date, 'day')
-> '2018-01-30'clusterTimeStart($date, 'day')
, clusterTimeEnd($date, 'day')
-> 2018-01-30T00:00:00.000Z, 2018-01-31T00:00:00.000ZAnd for any single property like $date
, it can be used only with a single unit like 'dayOfMonth' or 'day'. But if it is used with a noncyclic unit like 'day', then it can be used with up to all the three expressions clusterTime
, clusterTimeStart
& clusterTimeEnd
simultaneously. (I hope there's a clearer way to explain this 😅 )
For example:
clusterTime($date, 'dayOfMonth')
and clusterTime($date, 'day')
🚫 clusterTime($date, 'day')
and clusterTime($date, 'month')
🚫 clusterTime($date, 'day')
and clusterTimeStart($date, 'month')
🚫 clusterTimeStart($date, 'day')
and clusterTimeEnd($date, 'month')
🚫 clusterTime($date, 'day')
and clusterTimeStart($date, 'month')
🚫 clusterTime($date, 'day')
and clusterTimeStart($date, 'day')
✅ clusterTimeStart($date, 'day')
and clusterTimeEnd($date, 'day')
✅ OK, we now have three variants of this in three different branches to choose from. Each of these derives from the preciding one and in general each one allows some further simplification of the API and the implementation (the ugly parts of it).
551-time-dimensions
)We have three expressions:
clusterTime
which takes two different types: category with normal (serial) units of time, and numeric for cyclic units (month of the year, day of the week...).clusterTimeStart
and clusterTimeEnd
are of date type (internally as epoch numberf).This if very flexible since the user chooses the expressions and where to apply them.
timerange-3prop
)This replates the categorical use of clusterTime
together with clusterTimeStart
and clusterTimeEnd
by a new expressions clusterTimeRange
(working name) that returns a new type, TimeRange.
This can be a simpler concept than having time categories, start and end dates as separate expressions. But requires each expression to take advange of this to be adapted to use it.
So we could end up with two different expressions with different types one numeric for day of the week, etc, and the other of time range kind for year, month, ...
clusterTR($date) = '2017-03'
) and date ranges (clusterTR($date) > time(...)
).timerange
)This is a modification of the previous that uses only two propertines (start, end) in the dataframe, and computes the ISO category when needed for user presentation (interactivity, viewportFeatures). So the ISO category is not available in the Dataframe nor consequently in shaders.
Categories are computed from start end data in the dataframe (stored as 32bits floats), so we can lose second accuracy and then category restoration fails (a second off will make impossible to recognized a period of time). This has been patched by changing the DF value mapping to just a translation (offset) to eliminate some of the roundoff error when mapping into float32 values and back. It has worked in the tests but there'll be probably still cases where it fails(*). A possible solution is to drop support for second-level resolution, and make minutes the smallest unit of time usable (so we could round DF unmapped values to the nearest minute).
(*) Millisecond precision is lost for times spans of about 12 years; and we have a problem at the second level at about 80 years. So it seems the simple hack can be used instead of reducing precision to minutes. In any case the total time span will determine the maximum achievable precision, and we should either document this limit or detect attemps to use higher than possible precision.
Con: some ugly code in windshaft source (need MSN to decide what properties variants are used) and in decoding/encoding (now in metadata specializations) Con: API complexity, inconsistent concept of start/end TZ.
Could you explain this a little bit more? Are these problems unique to the first approach?
I have mixed feelings here. On one hand, the timerange approach seemed more user-friendly to me. On the other hand, I'm worried about too much complexity and things like "Filter could be adapted for ranges" that are not simple nor trivial.
Con: some ugly code in windshaft source (need MSN to decide what properties variants are used) and in decoding/encoding (now in metadata specializations)
That refers mostly to this and could be removed in the two TimeRange branches (since we always need the same 2 or 3 properties for each time dimension).
Ah!, and the decoding/encoding hassle is here. That can simplify a lot, specially in the last branch (timerange with 2 prop), and I have a semi-cooked refactor to put that in a separate codec class with proper naming (the decode/encode is used very confusingly here).
Con: API complexity, inconsistent concept of start/end TZ.
Refers to the fact that start/end dates are dates in the TZ requested but the user, but in plain JS you only have the concept of either UTC or local TZ. In the TimeRange approaches, since we have this specific abstraction we kinda can hide this inconsistency (e.g. by allowing accessing start, end values as epoch numbers; and our text representation is TZ-neutral).
After meeting with @davidmanzanares & @rochoa we've decided to go for the last PR: timerange with only two properties. We can fail meaningfully when timerange is not supported by an expression and document the limitations.
Then we can gradually add support for all interesting cases like filters or aggregations later.
The use case of using time categories with buckets etc. (i.e. in the GPU) seems not too important, so we can live without it. (I think that if at some moment the need really arises we could reintroduce a new categorical expression like the clusterTimeIso that we originally had).
Update: we now have a single PR, #991 with the final API design.
clusterTime
There's now a singe expression, clusterTime(property, units, timezone)
where timezone
is optional
There's two different kind of units: recurring cyclic units like month of the year or day of the week and serial calendar periods like years, months, days, ...
The type of units determines the type of the expression result: for cyclic the type is number: e.g. 1-12 for month of the year, 1-7 for day of the week, etc.
For serial, the type is TimeRange
which encapsulates a period of time (a year, a month, a minute, ...)
A TimeRange
has these properties:
text
an ISO8601-based representation of the period, e.g. '2018-04' for a month, '2018-04-01T03' for an hour.startDate
, endDate
are the start and end of the period; note that the period is defined as startDate <= t < endDate
, so the end date is just after the end of the period, for example the start of April 2018 is 2018-04-01T00:00:00 and the end is 2018-04-02T00:00:00.startValue
and endValue
represent the same instants but as millisecond epochs (since 1970-01-01 in the requested time zone.
Note that startDate/endDate are relative to the requested timezone like startValue, endValue (UTC by default), so the TZ it portrays is meaningless.semesterOfYear
(1:2)trimesterOfYear
(1:3)quarterOfYear
(1:4)monthOfYear
(1:12)weekOfYear
(1:53)dayOfWeek
(1:7)dayOfMonth
(1:31)dayOfYear
(1:366)hourOfDay
(0:23)minuteOfHour
(0:59)The format is based on ISO 8601:
year
'YYYY'month
'YYYY-MM'day
'YYYY-MM-DD'hour
'YYYY-MM-DDThh'minute
'YYYY-MM-DDThh:mm'second
'YYYY-MM-DDThh:mm:ss'week
represented as '2018-W03' (third week of 2018:, 2018-01-15 to 2018-01-21)quarter
(three months) as '2018-Q2' (second quarter of 2018, i.e. 2018-04 to 2018-06)semester
(six months) as '2018S2' (second semester of 2018, i.e. 2018-07 to 2018-12)trimester
(four months) as '2018t2' (second trimester of 2018, i.e. 2018-05 to 2018-08)decade
as 'D201' (decade 201, i.e. 2010 to 2019)century
as 'C21' (21st century, ie. 2001 to 2100)millennium
as 'M3' (3rd millennium: 2001 to 3000)For the numeric case, the clusterTime
expression can be used as any other numeric value inside linear, animation, etc. filter doesn't support it yet
For the time ranges, linear and animation have special support for this type. Animation behaves in a special way with a time range: it will hold for the duration of each time range, and fade in/out will be applied before and after the periods. (the behaviour can be disabled by wrapping the time range in a linear expression, which will be based on the start date of the time ranges)
viewportFeatures and interactivity is also compatible with clusterTime
like clusterAvg
etc., using clusterTime
precludes usage of the property directly. What's more, unlike cluster aggregation functions, once you use clusterTime on a property, that property can only appear inside clusterTime with the same arguments
I'll leave this short introduction of time dimensions here, which could be recycled for some doc, guide or post:
When backend aggregations are in use (for Windshaft sources), data is clustered spatially, and the clusterAggregations expressions give you the values of data properties summarised over the aggregation clusters (for example clusterMax($temperature)
, clusterAvg($price)
, clusterSum($population)
.
If a property is used out of any cluster aggregation expression, then it becomes a dimension of the data: it's used for clustering along with the spatial grid (just like torque cubes).
But usually that means that the aggregation is not so effective anymore, because the data cluster will be split by every single distinct value of the property, potentially dividing into a myriad of small clusters, often containing just a single feature.
The clusterTime
expression allows you to use a time property as an aggregation dimension (rather than as an aggregated value as with the cluster aggregation expressions like clusterSum
, etc.) in a controlled manner; that is, specifying how to discretise it so that we don't have too many unique values.
So clusterTime is a kind of classification or grouping into buckets methods. It lets you choose how a time property can be grouped into units of time (say months, days, minutes, ...) and then have each cluster of data be split by each unique value of this unit that occurs. For example clusterTime($date, 'month')
, clusterTime($arrival_time, 'dayOfWeek')
.
To use aggregations effectively, each property to be used should be wrapped in either a cluster aggregation expression (clusterSum
, clusterAvg
, clusterMin
, ...) or in a clusterTime
expression. In the first case we obtained an aggregate of the property value over each cluster. In the second, the classified property becomes an aggregation dimension.
Some desirable changes after talking with @davidmanzanares:
TimeRangeAnimation
: derive from BaseExpression
with And
as a child.VariantExpression
: try using setPrototypeOf instead
of returning a Proxy to this
isIdentity
: find naming more descriptive of intent, i.e. differentiate the case of a per-property encoding and a universal one, which allows to introduce encoded values in GPU inlined expressionsParseIso
compute start and end simultaneouslyThere some documentation now for TimeCluster
, TimeRangeExpr
& TimeRange
in the code, and some internal documentation here:
Addressed by https://github.com/CartoDB/carto-vl/pull/991
This originated in #130 In that ticket, @davidmanzanares said:
We need 3 things here:
dimensions
support in the Maps API for time columnsWe should also considere a similar mechanism for quantization (in buckets, ranges, ...) of non-time properties to be sent as aggregation dimensions in the backend.