IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Simple time series analysis in R-Instat #5635

Open rdstern opened 4 years ago

rdstern commented 4 years ago

I would like to teach this in the forthcoming AIMS climatic statistics course as the data are all time series. There is an interesting example of data in the datasets package called nottem (from Nottingham). It reads oddly into R-Instat - I wonder how it reads into RStudio?
In R-Instat it looks like data from mid-1920 to mid-1940, whereas it is clearly from January 1920 to December 1939.

The example here does a simple time series analysis as follows:

Source Anderson, O. D. (1976) Time Series Analysis and Forecasting: The Box-Jenkins approach. Butterworths. Series R.

require(stats); require(graphics)
nott <- window(nottem, end = c(1936,12))
fit <- arima(nott, order = c(1,0,0), list(order = c(2,1,0), period = 12))
nott.fore <- predict(fit, n.ahead = 36)
ts.plot(nott, nott.fore$pred, nott.fore$pred+2*nott.fore$se,
        nott.fore$pred-2*nott.fore$se, gpars = list(col = c(1,1,4,4)))

In the AIMS course we can do this analysis with RStudio, but I would like to do it also in R-Instat. We have arima in the Model > Hypothesis tests and predict in Model > Use Model > Prediction.

Is this all or could we do more. And what about a ts.plot?

rdstern commented 4 years ago

As well as having some tests for time series I think it is time we were able to define a variable as a ts type. This could perhaps have its own simple dialogue, like Prepare > Define > Convert to Circular. But (unlike circular) there will often be multiple variables that we could convert in one go. So there would be a multiple receiver. We would also need to set the Frequency for daily data? Is it 365.25? Or perhaps we add the idea of defining a zoo object too to this dialogue. With daily data we may also (in general) want to have multiple periodicities. This generalisation of ts is to msts from the forecast package. We have zoo, and should add fable. (This is the replacement for the forecast package.)

rdstern commented 4 years ago

Searching a bit more I have found the fable package that replaces the forecast package. In addition there is a new book that looks great. It is Forecasting Principles and Practices. It is a free online book - possibly of the sort we should be writing about R-Instat etc. It is free and online and updated as you update software - currently there is no print version and the most recent update was on 23 Jan 2020!

The accompanying R packages are also constructed to become part of tidyverse. It is all highly consistent with what we are doing for education. It doesn't seem to have anything yet on climatic data. More as I read more.

rdstern commented 4 years ago

I would like to make adding time series to R-Instat into a high priority now. This would go onto the new Structured menu in R-Instat. I suggest this could be relatively easy, after the initial step, which is the define dialogue. The basic time series structure in R is ts, while zoo is a bit more general. But I suggest we implement tsibbles as in the tsibble package.

This makes a tibble data frame. I don't know whether this can be the same as the "ordinary" data frame or would be another data frame. I also don't know how this would be included in the R-Instat data sheet information? I know our "standard" data frames have a Date component (which is useful), and they may have multiple stations. This is fine as a tsibble, but too general for the ts and zoo structures.

Eventually this tsibble idea may influence our current climatic data frames idea. It is similar, but perhaps more general.
The use of tsibble with the climatic menu is also discussed in #5639. However, I don't want this to delay the construction of this new time-series menu.

It is possible that the define dialogue might include the possibility of ts and/or zoo as well? That may be for the future. The next comment explains the importance of this proposed new menu.

I am assuming that @lilyclements might be able to become involved on this initial component. She may wish (at this time) to also include the define for survival! I am not assuming she will have to do the full menu - but we have this define hurdle to overcome first!

rdstern commented 4 years ago

This menu has become important for at least 2 reasons. The first is general, that climatic data are time series, hence analysts need to understand the concepts of time series analysis. The second is that an important topic in our work is the infilling of missing data in the series. And a number of the functions for this infilling build on the time-series models fitted to the data. This is sort of hind-casting, rather than the usual use of time series analyses for forecasting. Useful for this.

So good to know which models are sensible for different elements, etc, and this will then dictate which functions to use for the infilling.

So once the define dialogue is in place, then I would like to see the describe section including decomposition plus the ggplots that are in the feasts package. Then modelling might initially be a single dialogue sort of like the general fitting dialogue, (and also the structure of the extreme modelling dialogue. This could be done reasonably soon perhaps. It is simpler than the dialogues needed for the describe.