Add scripts to the R-Instat library

@lloyddewit I would like a few examples to start the R-Instat script library. I suggest we don't need anything new. But we add a script directory to the current R-Instat library. Then like the climatic directory, where we have climatic data from different countries, here we will have scripts for different tasks.

So I suggest we have sub-directories as we add the scripts and most can correspond to the R-Instat menus. I am going to give a few examples below and suggest we have File, Prepare, Describe, Model, Climatic as initial sub-directories. We might later have more, but I suggest that most of our initial scripts will usefully be for specific tasks that enhance what we can do from the menus. So they are likely to relate to menu items and our main menus are a good starting point. We can help more with sensible file titles and always starting with a bit of description.

This collection could become quite important, as we are aiming for users to modify code, rather than writing code from scratch.

There will be 4 routes to most if our examples: a) Examples from the R-packages in R-Instat. (The examples in each manual - that is compulsory for CRAN are all scripts. They are for each function, to illustrate its use, or show analyses for datasets. We will usually have to include the library statements at the top, so we don't need to add the package names each time, and also add a line or two is we want to include the data in our data book.
b) Examples from R textbooks. They are similarly often provided as blocks of code. c) Examples to solve specific problems - usually from stack-overflow. (my recent example is given below.) d) Examples we find we need. An example that Patrick could provide is to facilitate reading from multiple Excel files. We can read multiple sheets from one Excel file and multiple other types of file, but not multiple Excel files, with just one sheet in each. And it isn't something I would like to encourage - so I'm quite happy if we just have that as an example script for users to tweak if they need!

I now give just a few examples to get us started.

The first example is from the book called Text Mining with R, it is from Chapter 1: It reads all 6 books at once - which we can't do in R-Instat, and then has cunning code to add the chapters in each book. Neat and powerful code!

library(janeaustenr)
library(dplyr)
library(stringr)

original_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
    chapter = cumsum(str_detect(text, 
    regex("^chapter [\\divxlc]",
    ignore_case = TRUE)))) %>%
  ungroup()
  data_book$import_data(data_tables=list(data=original_books))

This is the code, simply copied from the book, into R-Instat with the last line added, so it is read into a data frame. This would go easily into the prepare section, unless we want to consider having the book: Text Mining with R as the directory? If we get more examples then the source as a directory - book or package - might be useful?

My second example is one where there's lots. It is from the agricolae package that includes quite long examples. They would come in the Experiments directory.

Here is the example copied from a design. It is the aplha design. I am not sure whether we will need to copy them, but here it is before adapation, because there is a command that used to give an error in R-Instat! You were looking for those examples!

library(agricolae)
#Example one
trt<-1:30
t <- length(trt)
# size block k
k<-3
# Blocks s
s<-t/k
# replications r
r <- 2
outdesign<- design.alpha(trt,k,r,serie=2)
book<-outdesign$book
plots<- book[,1]
dim(plots)<-c(k,s,r)
for (i in 1:r) print(t(plots[,,i]))
outdesign$sketch
# Example two
trt<-letters[1:12]
t <- length(trt)
k<-3
r<-3
s<-t/k
outdesign<- design.alpha(trt,k,r,serie=2)
book<-outdesign$book
plots<-book[,1]
dim(plots)<-c(k,s,r)
for (i in 1:r) print(t(plots[,,i]))
outdesign$sketch

On Version 0.7.16 I get the following error:

This is on running line 15. It has the same error on tokens on lines 13 and 25. It now works fine on your latest version. This is quite a nice type of example to include. The package has 14 different designs, with similar code for each. We could include 1 of them, and adapt to put the results nicely in a data frame and then discuss how easily this could be adapted for his other designs. I'll try and report.

@rdstern This sounds great. As soon as you think we have anough examples, then we can start implementing this in a PR.

@lloyddewit here is part of an example I may be adding (and maybe modifying). I( may not need to, because it is in the library already. I include it below to highlight a possible limitation so far in the output? It is only in the R viewer, and not in the output window.

It is from the agridat package and is called gomez.splitsplit. The R-help button includes the sample code. Here is part of that code:

library(agridat)

data(gomez.splitsplit)
dat <- gomez.splitsplit
dat$nf <- factor(dat$nitro)

libs(desplot)
desplot(dat, nf ~ col*row,
        # aspect unknown
        out1=rep, col=management, num=gen, cex=1,
        main="gomez.splitsplit")
desplot(dat, yield ~ col*row,
        # aspect unknown
        out1=rep, main="gomez.splitsplit")

You may find thew libs command in the package adds the desplot package for you?

It runs fine and the graph is useful. It is useful enough that was may aim for a dialog soon, in the experiments menu. It produces a graph. This appears fine in the R viewer, but that is limited, of course. I assume we should aim for it to be produced in the output window as well?

By the way, it is not a ggplot graph. There is a lattice graphics system and this is what is being produced.

However, in the code above, if you change the desplot function name to ggdesplot then it will produce a ggplot graph. It would be good to check that both types of graph appear in the output window.

@lloyddewit here is another set of "testing" scripts? It is from the agriTutorial package. You can get to the code for the 5 examples via Help > Package Documentation. Example1 etc.

The full code for example 1 (out of 5) is the most interesting, but also the longest. I give it below in case that helps. But that code should also run if you type:

library(agriTutorial)
example("example1")

When I try it clearly gives me 2 successive figures in the R graph window and the start of the output in the output window - - just a minute. It is in the output window, and Maximise is working and it is all there I think. Wonderful.

So it raises simply the issue of graphical output going (only) to the R viewer, while numerical results go to the output window and then can be put into the maximised output.

By the way the R graphical viewer does allow multiple graphs. I give more examples in the next comment.

I have now found that the other 4 examples often use =rather than <-. This is not trivial to edit, because they are quite long and = is correctly used elsewhere in the functions. You were wondering about trying to implement this and I would now welcome it more than I said before.

Here is the full code for example 1. The packages mentioned are all already in R-Instat. I am not sure - at the top - how it knows about the data?

## *************************************************************************************
##                       How to run the code
## *************************************************************************************

## Either type example("example1") to run ALL the examples succesively
## or copy and paste examples sucessively, as required

## *************************************************************************************
##                       Options and required packages
## *************************************************************************************

## Packages lmerTest, emmeans and pbkrtest MUST be installed
require(lmerTest)
require(emmeans)
require(pbkrtest)
options(contrasts = c('contr.treatment', 'contr.poly'))

## *************************************************************************************
##            Section 1: Qualitative analysis of factorial treatment effects
## *************************************************************************************

## Table 1 Full analysis of rice data assuming qualitative nitrogen effects
rice.aov1 = aov(yield ~ Replicate + management * variety * nitrogen +
Error(Replicate/Main/Sub), rice)
summary(rice.aov1, ddf = "Kenward-Roger", type = 1)

## Table 2 REML means and se's for additive management and qualitative nitrogen effects
rice.means = lmer(yield ~ Replicate + management + nitrogen * variety +
 (1|Replicate:Main) + (1|Replicate:Main:Sub), data = rice)
anova(rice.means, ddf = "Kenward-Roger", type = 1)
plot(rice.means, sub.caption = NA, ylab = "Residuals", xlab = "Fitted",
 main = "Full analysis with full nitrogen effects")
emmeans::emmeans(rice.means, ~ nitrogen)
emmeans::emmeans(rice.means, ~ variety)
emmeans::emmeans(rice.means, ~ nitrogen * variety)

## REML contrasts and sed's for additive management and qualitative nitrogen effects
n.v = emmeans::emmeans(rice.means, ~ nitrogen|variety)
emmeans::contrast(n.v, alpha = 0.05, method = "pairwise")
v.n = emmeans::emmeans(rice.means, ~ variety|nitrogen)
emmeans::contrast(v.n, alpha = 0.05, method = "pairwise")

## Table 3 Mixed model effects for rice data with significance tests
rice.lmer = lmer(yield ~ Replicate + nitrogen * management * variety + (1|Replicate:Main) +
 (1|Replicate:Main:Sub), data = rice)
anova(rice.lmer, ddf = "Kenward-Roger", type = 1)

## *************************************************************************************
##            Section 2: Quantitative analysis of factorial treatment effects
## *************************************************************************************

## adds raw N polynomials to data frame: note that the nrate is re-scaled
N = poly((rice$nrate/100), 4, raw = TRUE)
colnames(N) = c("Linear_N", "Quadratic_N", "Cubic_N", "Quartic_N")
rice = cbind(rice, N)

## Table 7: Mixed model fitting raw polynomials for nitrogen effects
rice.fullN = lmer(yield ~ Replicate + management + variety * (Linear_N + Quadratic_N +
 Cubic_N + Quartic_N) + (1|Replicate:Main) + (1|Replicate:Main:Sub), data = rice)
anova(rice.fullN, ddf = "Kenward-Roger", type = 1)

## Table 8 Coefficients for separate linear and common quadratic N with additive management
rice.quadN = lmer(yield ~ Replicate + management + variety * Linear_N + Quadratic_N +
 (1|Replicate:Main) + (1|Replicate:Main:Sub), data = rice)
summary(rice.quadN, ddf = "Kenward-Roger")

## *************************************************************************************
##                       Section 3: Model assumptions
## *************************************************************************************

## Full analysis of variance of block and treatment effects showing large mean square error
## due to variety-by-replicates interaction effects
rice.fullaov = aov(yield ~ Replicate*management * variety * nitrogen, rice)
summary(rice.fullaov, ddf = "Kenward-Roger", type = 1)

## Fig S1 Nitrogen response per variety per plot showing anomalous behaviour of Variety 1
## in Blocks 1 and 2 compared with Block 3
Rice = aggregate(rice$yield, by = list(rice$Replicate, rice$nitrogen, rice$variety),
 FUN = mean, na.rm = TRUE)
colnames(Rice) = c("Reps", "Nlev", "Vars", "Yield")
wideRice = reshape(Rice, timevar = "Nlev", idvar = c("Vars", "Reps"), direction = "wide")
wideRice = wideRice[,-c(1, 2)]
N = c(0, 50, 80, 110, 140)
par(mfrow = c(3, 3), oma = c(0, 0, 2, 0))
for (i in 1:3) {
for (j in 1:3) {
    plot(N, wideRice[(i - 1) * 3 + j, ], type = "l", ylab = "yield",
    main = paste("Variety",i,"Block",j), ylim = c(0, max(wideRice)))
    }
}
title(main = "Fig S1. Variety response to nitrogen for individual replicate blocks", outer = TRUE)

## Subset of data excluding variety 1
riceV2V3=droplevels(rice[rice$variety != "V1",])

## Restricted analysis of variance of block and treatment effects excluding variety 1
## compare variety-by-replicates interaction effects of full and restricted analysis
rice.fullaov = aov(yield ~ Replicate*management * variety * nitrogen, riceV2V3)
summary(rice.fullaov, ddf = "Kenward-Roger", type = 1)

## Restricted analysis assuming qualitative nitrogen effects excluding variety 1
rice.aov1 = aov(yield ~ Replicate + management * variety * nitrogen +
Error(Replicate/Main/Sub), riceV2V3)
summary(rice.aov1, ddf = "Kenward-Roger", type = 1)

By the way I like the fact it gets as far as mentioning Kenward-Roger on the last line - they are both former colleagues from long ago!

In the workshop in Burkina Faso the following question has been raised:

"I was looking for how to use R-Instat to do an MCA analysis, but I could not find any options for it. I would appreciate it if you could help me with it."

As background I note that this is available in the FactoMineR package. We use that for PCA but do not yet have an MCA dialog.
(And further background: PCA - Principal Components Analysis gets a few linear combinations of numerical variables that explain the variability of a larger set of variables. MCA does the same for factor columns. We are very happy with factors being handled well in R, so we should add this as soon as we have time.)

This is a "No, but Yes" answer to your question.

No: If you insist on only using dialogs then No, we do not yet have a dialog for MCA. Yes: We are completing the capabilitiy to run scripts - and almost any script that runs in RStudio should run.
Note we are not trying to rival RStudio if you need to develop scripts. But often people only need to use a script, rather than develop one themselves. Or they can find a script that just needs minor changes. We hope that R-Instat can cope with that.

I start also with a more general question. Is it only MCA that you want? I ask because FactoMineR has many other functions in addition to those two. Would you like more?

So I start with MCA and used the code below that is provided in the FactoMineR package for illustration.

This almost all runs fine in the released version 0.7.16 that I think you are using.

The example uses 3 datasets. The tea and the hobbies code runs fine. The code for poison has a bit in that doesn't run in your version, but is now fixed ready for the next version.

The factoshiny lines, in the code, require the factoshiny package to be installed. You need to run R-Instat with administrator rights to do this. Then I got this screen:

And the last lines in the script also run ok, because the missMDA package is installed automatically with ther shinyapp.

@lloyddewit The examples above provide points for reflection on the current system for getting, using and saving scripts.

There were 10 scripts provided by the FactoMineR package: a) Getting them will be easier once @Patowhiz makes improvements to the Maximised Output Window. I note for reference here:

1) Many packages have the alphabet at the top for hyperlinking in the page. This looks as though it will work, but it doesn't. 2) I would like to be able to go back, when I click to load the page for a single function.
3) Currently I have to close and then start with the dialog Help > Package Documentation dialog again. 4) On the positive side the "standard" way to look at this documentation is through the pdf version. This is more convenient, because it doesn't split by pages.

b) Back to the scripts. I now see interesting parallels between the data and the scripts. Could we consider a name button, to be able to name the scripts, even when they are just online. Maybe when you then want to save, it can suggest that name as a default? c) Could we consider saving a set of scripts, rather than only individually? Here I had the code for each of the 10 functions in the FactoMineR package. I suggest having a collection of scripts from a package could often be useful?

@lloyddewit I have followed up on point c), see for example this discussion. I note that Rstudio has something. There is also a package called here that may be relevent - and could be related to saving everything in a directory, and then being able to load that directory back in to R-Instat.

I note our possible sources for scripts. They include the above, namely all the sample scripts from a given package. Or all the scripts from successive chapters of a book.

Now the discussions in R are different, because they assume that the script is the starting point. If you are a "proper" R developer then you might start with a script. Then you make the script into self-contained functions, and you put the functions together into a package.

For us the scripts are more the end point. And having a set of simple scripts, rather than putting them always together, seems pretty sensible.

I don't know whether we want to go further and be able to relate scripts to data sets. So, having an option to save scripts as part of the data book, would then become an option? I suggest, for us, many scripts will relate to names of variables from specific data sets, so having that as a possibility may be worth considering.

I now have 2 sample directories of scripts for the library. Perhaps there could be a scripts directory. Inside we have FactoMineR as a directory and in that we have a set of scripts from the documentation of that package. A rich resource is the agridat package, so that's a directory. There are also sub-directories and agridat_trees is the first. There will be agridat_crops and agridat_liverstock as well.

For now I have 2 zip files' The content may change slightly, but they are a reasonable start, in their own right and (I hope) ready for when we can load and save multiple files.

agridat_trees.zip FactoMineR.zip

IDEMSInternational / R-Instat

Add scripts to the R-Instat library #8497