IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Completing the circular components in R-Instat #5503

Open rdstern opened 5 years ago

rdstern commented 5 years ago

The addition of circular statistical summaries into R-Instat, largely by @Wycklife with support from others, has been a major enhancement. They are as follows: a) Circular keyboard on the calculator - including a key to define a variable as circular b) Circular tab on the Prepare > Column: Reshape > Column Summaries These have been merged for the next release c) Polar coordinates in the plotting sub-dialogue - just about ready to merge? d) New simple dialogue to define circular variables. @Wycklife is working on this now.

There remain some detailed @dannyparsons type tasks. I hope they are small! a) Show circular variables in the grid. They are defined in the meta-data. Perhaps wind_dir (ci) would be appropriate, or (cl), or (cr)? b) Check and possibly correct the metadata and hence the display of calculated and summary variables.

circular with summary.zip

I am finding some oddities as I write the documentation. I will describe them as they appear. I am using 2 datasets, both from the library. The attached is from the circular package. It is wind data in radians. The attached is these data with additional variables. It is from 62 days, (in a single year) with 5 observations per day. The additional variables are the direction in degrees, the day and the time within the day. Then I calculate summaries and the summary data frame is also shown. Note that as the data were defined as circular, some of the "ordinary" summaries are "captured" by the software and climatic summaries are calculated automatically.
Oddities include: a) Mean (from ordinary mean. The metadata is all NA - including the name. This seems not to be the case for the other summaries? b) The summary data are not defined as circular. That may be ok, or not. If they are, then a circular mean is still circular, but a circular sd is not - I think. I am not sure what happens with calculated variable. I will report when I know! c) It may be that we simply don't want the ordinary mean (and other ordinary summaries) to produce the circular results, even for data defined to be circular? I notice that max and min are different. Mean is (I think) the same, sd is different! Is it just the mean that is "captured". Can we prevent this? Should we? (If just the mean then one solution is to have our own package with a mean that is the same as the mean in base, and load it after the circular package.)

rdstern commented 5 years ago

I have looked at what might be possible a bit more to achieve the ordinary mean and median still giving the "ordinary" results even when a variable has been defined as circular. I don't quite understand what is happening because when a variable is defined as circular then it gives the circular median even when I give the ordinary command as stats::median(calc1).

(I was using the wind data from the circular package and defining a calculated variable as circular.) I had assumed it was all down to the search order for a function, but presumed this should have worked to give the ordinary median. If it is the search order , then one possibility could be to load the circular package with the position argument set high, e.g. from here:

library("circular", pos = .Machine$integer.max)

The only way I was able to get the ordinary median from the ordinary median function (in the R-Instat calculator) for the circular variable was to give the command as: median(as.numeric(calc1)) This is messy, but (if there is no other way, then could the vb code for the mean (and median) check whether the type of variable is circular and (if so) then it adds the as.numeric() into the function?