IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Adding missing values options to summary system #3306

Closed rdstern closed 6 years ago

rdstern commented 7 years ago

It is good that the summary system is now being used more comprehensively in the dialogues. Please can the extra facilities for coping with missing values be added in the July release.

This would also be a useful addition in that it extends what R does. I need it for the climatic, but suggest it could be useful as a general facility. I hope it won't make things too slow.

Currently (in R) each command either makes a summary missing whenever either

a) a single value is missing b) all are missing (or so many that the calculation can't proceed.

Could we add a facility to be able to provide a summary whenever: c) Less than a certain number are missing, or d) less than a certain fraction are missing.

I assume this can proceed by doing the calculation with option b) above, and also calculating the number of missing. Then setting the result as missing if the number or fraction is too great.

There are 2 further issues we may wish to consider at the same time: a) I think totals are the only obvious summary that we may wish to "adjust" for the number of missing values, i.e. we could multiply up. However I am very nervous of doing anything here, because there is a huge literature (and many methods in R) on imputation of missing values. We could be stepping into a minefield. b)On the other hand I am attracted by the potential of using our multiple missing value codes here. It would be very little extra work to add the possibility that if it is the adjustment (rather than the basic calculation) that makes the result missing, then this is a tagged missing, so it is distinguished from the ordinary missing.

rdstern commented 6 years ago

We now have the tab for the missing value options. I also do it manually. I think that doing my manual system from the dialogue would be easy to implement (Danny/David to confirm) and would be very happy if that could be implemented from the dialogue. It would be great to have it in the climatic summary for the Benin workshop- which would be automatic, because that uses the same sub-dialogue.

  1. If you opt for the missing addition then the command automatically ticks the number missing and number of observations - even if the user has not done this. There can never be any harm putting those into the summary data frame. (That's my first step.)
  2. My second step is to to an ifelse calculation using these columns. It is then simply either an ifelse on the number of these observations, or on the fraction. This is done on each of the other columns produced by that summary command.
rdstern commented 6 years ago

I am sure this is also discussed elsewhere. So I am closing this one