Open HeikoH opened 9 years ago
@HeikoH thanks. can you point me to a Layer that has an exemplar set of temporal line data?
Or an array of values :)
@HeikoH is right. Line charts will be binned by date intervals (days, weeks, years...) depending on the scale and range of the data. They are essentially histograms of temporal data in which we summarize (count, sum, etc...) the features in each bin (date range).
We need to provide a default for the date range bins, but the user needs to be able to change binning to a different scale to see more or less detail in the time series pattern.
@ajturner see https://github.com/HeikoH/cedar/commit/e12ca3c94e7804efb4b5ee6145c6778df5ea1287, which will result in this chart:
The challenge is to construct the data... But it will be similar as how histogram data is created.
That's good @HeikoH - do you want to PR it?
Regarding the binning - this is again where Vega could do the work. There are built in "aggregate" methods the developer could optionally use if you had feature level data.
@ajturner no, don't really want to create a PR because I believe a solution should also be provided to allow using a FL url. Otherwise what's the point of the sample, right?
@ajturner was trying to do something with Vega data transforms to transform the data, no luck (yet)...
@phlorah do you have some specific rules in mind as how to determine the bin size?
Time is a little trickier because we want to split the bins into meaningful time units. I will send you what the spatial stats team uses to bin time for Create Space Time Cube.
@sasbab may have insight into determining temporal bin sizes
From Flora:
Here is the rounding method the spatial stats team uses for Create Space Time Cube:
def timeExtentRounder(seconds):
"""Rounds given default temporal span in seconds into nearest meaningful
block."""
if seconds < 10:
ARCPY.AddIDMessage("ERROR", 110037)
raise SystemExit()
elif seconds < 100:
#### Less Than 100 Seconds = 1 second ####
return 1, "1 Second"
elif seconds < 300:
#### Less Than 5 Minutes = 10 seconds
return 10, "10 Seconds"
elif seconds < 900:
#### Less Than 15 Minutes = 30 Seconds
return 30, "30 Seconds"
elif seconds < 3600:
#### Less Than 1 Hour = 1 Minute
return 60, "1 Minute"
elif seconds < 21600:
#### Less Than 6 Hours = 5 Minutes
return 300, "5 Minutes"
elif seconds < 43200:
#### Less Than 12 Hours = 30 Minutes
return 1800, "30 Minutes"
elif seconds < 86400:
#### Less Than 1 Day = 1 Hour
return 3600, "1 Hour"
elif seconds < 259200:
#### Less Than 3 Days = 2 Hours
return 7200, "2 Hours"
elif seconds < 864000:
#### Less Than 10 Days = 6 Hours
return 21600, "6 Hours"
elif seconds < 7776000:
#### Less Than 90 Days = 1 Day
return 86400, "1 Day"
elif seconds < 31536000:
#### Less Than 1 Year = 1 Week
return 604800, "1 Week"
else:
#### Round to Year or Months
return decideMonthlyYearly(seconds)
@phlorah, @ajturner @HeikoH
I recently connected Mark Janikas to my former development manager who is more expert on time series.
Here was his response: "The answer to your question is very complicated. If it is economic data, humans behave based on the hour-of-day, day-of-week, week-of-year, etc. For this data, seasonal dummies tests, seasonal augmented unit root tests, and others are useful. For other data with more complex cycles, it is best to breakdown the data first. For example, I like to use Singular Spectrum Analysis (SSA). See the attached paper."
I can forward the paper to anyone who wants it. Also, you might want to look at the SAS doc for PROC TIMESERIES, the procedure for aggregating and preparing time series data. The doc should be openly available online.
When creating UI experiences at SAS, we always asked the user the frequency they wanted to use because the problem they were trying to solve or question they were trying to answer should dictate the level of aggregation.
Note, if you are going to forecast or create a model, you need to be careful about incomplete bins at the beginning and end of the series (so that your aggregated values aren't deceptively low because the period is incomplete).
The current example doesn't make much sense. You can't connect single observational data points with lines. Doing so suggests you can interpolate between two points, which you can't. A proper timeline chart would need to bin observations per year, something like this: