IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Time series data should be defined and should also (if possible) convert #9004

Open rdstern opened 4 weeks ago

rdstern commented 4 weeks ago

@N-thony this is a problem that @MeSophie and I have sort of fallen into when trying to improve the line graphs. Some data from the datasets packagage are imported as time series (their class is ts) and we don't yet process those data well.

I don't want to get side-tracked into adding time series methods into R-Instat yet. So I have a limited target with this issue. I have 2 example datasets from the standard datasets package and I'.m sure there are more. They are LakeHuron and WWWusage.

a) When I look at the column metadata they are of class ts, so they are recognised on import automatically, but (like the polynomials) they are not indicated in the data frame. Could we indicate them as ts, or TS, or S. I think this will become important enough that we might want a special label, but I'll take what is easy for now.

Would it be possible to maike this general, rather than every new case being coded separately? So any variable that is a new type would be coded as S automatically in the grid?

b) I am unable to convert the ts object back to numeric? I checked here and it should be easy.

In R-Instat it is in multiple places. I assume most use data_book$convert_column_to_type.

I tried in the Duplicate Column, which has an option to change the type and also in Prepare > Data Frame > Convert Column. Neither work yet, at least with options I tried? I assume they both use the data_book$convert_column_to_type

I tried Right-click > Convert to Numericand it converts ok but keeps the extra tsp information in the metadata.

I succeeded with the calculator, as shown below:

calc is justas.numeric(value) and gets rid of the extra metadata. calc1 is unclass(value) and makes it numeric, but leaves the tsp metadata. Maybe we want both options in the duplicate and convert dialogs.

image

a)

N-thony commented 3 weeks ago

@derekagorhom you recently asked for something to work on, can you have a look of this?

Patowhiz commented 2 weeks ago

@rdstern excuse my ignorance. In regards to the context of a single column header, why would we not want it labeled as numeric, is there a fundamental difference between a TS and numeric column?

On a separate note, I suggest whenever possible, we lean towards limiting R jargons in the front end. I have no issue with class being shown as ts, a class is already an R jargon.

rdstern commented 2 weeks ago

@Patowhiz we fell into this problem, because line plots are particularly useful for time-series data. And the main datasets package has lots of them. They are roughly numeric variables with linked information (visible in our metadata) that gives their period, etc.

For some reason the line geom doesn't like them, unless treated specially. And this looked odd in R-Instat, because they seemed like a numeric variable, but behave differently. I'm keen to make quick progress and not to get sucked into the details of time series analysis at this stage. And the ts class is now usually generalised, because it only copes with equally spaced series.

We will need to venture there soon - partly because all our climatic data ARE time series. For now I just wanted to be able to see what's a time series variable (which doesn't need the accompanying year as the x, for example, because that is in the metadata), and then be able to deal with it.

Your second point, why TS and not ts is good. I suggested TS, because our other types are capital letter. But I'd be quite happy to have it changed to ts. Would you like to suggest that to Derrick in the pull request? Or become the second reviewer? That would be fine. Just inform Antoine,

Patowhiz commented 2 weeks ago

@rdstern thanks for enlightening me. I wasn't aware they have linked information behind them. In that case, they have a clear distinction from numeric. I support the change.