IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

Giving the by command? #4386

Open rdstern opened 6 years ago

rdstern commented 6 years ago

Is there any way we can currently give the by command in R-Instat. I am thinking initially with a command we type into the calculator or in the script window. That sort of gives the same power for functions (like the statistical tests) that the summary dialogues gives for data manipulation.

The information says it is to "Apply a Function to a Data Frame Split by Factors"

dannyparsons commented 6 years ago

If your calculation returns a single value you can use dplyr, e.g.

survey %>% dplyr::group_by(Village.) %>% dplyr::summarise(t.test(Field, Size)$p.value)

# # A tibble: 4 x 2
#   Village. `t.test(Field, Size)$p.value`
#     <fctr>                         <dbl>
# 1    KESEN                     0.5246753
# 2    NANDA                     0.0008996
# 3     NIKO                     0.6953557
# 4    SABEY                     0.0022165

works in the calculator.

In the script window this would be:

survey <- InstatDataObject$get_data_frame(data_name="survey")
survey %>% dplyr::group_by(Village.) %>% dplyr::summarise(t.test(Field, Size)$p.value)
rm(list=c("survey"))

To get the full output for a statistical test you can use by: by(survey, survey$Village., function(x) t.test(x$Field, x$Size))

# survey$Village.: KESEN
# 
# Welch Two Sample t-test
# 
# data:  x$Field and x$Size
# t = 0.66, df = 12, p-value = 0.5
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -2.162  4.019
# sample estimates:
#   mean of x mean of y 
# 5.143     4.214 
# 
# ------------------------------------------------------------ 
#   survey$Village.: NANDA
# 
# Welch Two Sample t-test
# 
# data:  x$Field and x$Size
# t = 3.9, df = 19, p-value = 9e-04
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   4.86 16.00
# sample estimates:
#   mean of x mean of y 
# 15.071     4.643 
# 
# ------------------------------------------------------------ 
#   survey$Village.: NIKO
# 
# Welch Two Sample t-test
# 
# data:  x$Field and x$Size
# t = -0.41, df = 7.6, p-value = 0.7
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   -4.033  2.833
# sample estimates:
#   mean of x mean of y 
# 4.8       5.4 
# 
# ------------------------------------------------------------ 
#   survey$Village.: SABEY
# 
# Welch Two Sample t-test
# 
# data:  x$Field and x$Size
# t = 4.1, df = 10, p-value = 0.002
# alternative hypothesis: true difference in means is not equal to 0
# 95 percent confidence interval:
#   3.361 11.439
# sample estimates:
#   mean of x mean of y 
# 11.4       4.0 
rdstern commented 6 years ago

That's great - already! Now if we add that to my new suggested dialogue in #4385 - no longer in the calculator - then we have that feature in at least one dialogue - which can be a stepping-stone for those who want to use the feature elsewhere. Then I think the script window can provide (at least initially) some of the looping we need.