IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Additional option on duplicates dialogue #3840

Open rdstern opened 7 years ago

rdstern commented 7 years ago

This is a proposed third option on the duplicates dialogue. It is for time series data and checks successive values. I give 2 examples: 1) For rainfall data perhaps 5th May 2010 has 7.4mm. It is very unlikely that the next day will also have exactly the same value, i.e. 7.4mm again. So I would like to know when this happens. (What is more likely - and has happened quite regularly in data I have been checking - is that 7.4mm was recorded and then the data entry operator realised it was for the wrong day. So they also type it on the correct - next - day, but forget to change the value - usually zero - on the previous day.)

Of course, with the rainfall data, zero happens quite often. So two (or more) successive zeros is not a surprise. We would want to ignore that.

  1. With (say) temperatures 5th May 2010 might have Tmax = 23.2 degrees C and it is not particularly surprising if the next day has the same maximum. But 3 days in a row would be stretching it, and 4 days is almost certainly an error somewhere.

So, we want to have a lower limit (default 2) of successive values before we make a special note.

On the dialogue, 1) This would be a third option button with label Single Variable. (Or perhaps Successive Values?) 2) The data selector is the same, but there is now a single receiver. 3) There is a label "Lower Limit of successive values" This has an up-down control with 2 as the minimum and no maximum. 4) There is a checkbox (default unchecked) It has label Omit Value(s). Then there is a control (I wonder if it is now a user control) with a combination of a drop-down with ==, <, <=, > >= and then one or two number fields - depends on the control (it doesn't need anything special). Default when checked is == 0, so it omits dry days in rainfall data. 5) The new column name control can be the same as there is now. On the R-side I will ask Sam if he can write a function for this. It is the sort of thing I think he will like! The resulting column could usefully be zero for all records that are not repeats. If they are repeats then (ideally) they might be each with the number of repeats. So, if a value is repated 4 times, then each record would have the value 4.

AlexSananka commented 7 years ago

image

@rdstern Is this somehow the way we want the new additional option look like?

rdstern commented 7 years ago

That looks good. The three radio buttons on the lhs are not needed. So the Options can go right across the dialogue. And I forgot an option. This is another checkbox (default unchecked) with the label Tolerance. If checked there is a control to enter a single numeric value with default 0.01 there.

AlexSananka commented 7 years ago

Okay,I actually forgot that this is going to pass a totally different function which has not even been written am going to hide the rdos in this case.