IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Dialogue to facilitate running a new function in R-Instat #5986

Open rdstern opened 4 years ago

rdstern commented 4 years ago

The script window allows me to run more or less any small sequence of R-commands. I often use it for a new function from one of our existing packages, or maybe I add a new package and try a function.
This works well on the examples that are almost always provided in the manual associated with the package.
But I would then often like to: a) run the example with data from R-Instat, rather than from within the example. b) be able to save the results back into the Data Sheet or Data Book, so I can continue with my work using R-Instat in the "normal way".

(I know I can do this by migrating to RStudio, but that is essentially a one-way process. I suggest sometimes I wish to continue with R-Instat, but there is one command missing in the process.)

I currently do this by using a dialogue such as Model > Fit Model > Fit Model Keyboard. I choose any function and then write to the script window and edit what is there. It would be much better to have a dialogue that is specially designed for this.

Here is an example. I would like to do some declustering of extremes in climatic data. This is in the extRemes package, which is already installed. But the function isn't.

Here is a simple example adapted from the manual:

y <- rnorm(100, mean=40, sd=20)
y <- apply(cbind(y[1:99], y[2:100]), 1, max)
bl <- rep(1:3, each=33)
ydc <- extRemes::decluster(y, threshold= 60, r=1, groups=bl)
ydc
plot(ydc)

In the code above, the first 3 lines produce the data. Then line 4 runs the function, which produces the object called ydc. Then line 5 displays the object in the output window and the last line displays a plot.

Now I adapt this for the Ghana data using the Model > Fit > Fit Model dialogue as my guide: This gives the following code - when I also opt to save the model!

# Code generated by the dialog, Modelling

ghana <- data_book$get_data_frame(data_name="ghana")
attach(what=ghana)
model1 <- lm(rainfall~year, na.action = na.exclude)
data_book$add_model(model_name="model2", model=model2, data_name="ghana")
data_book$get_models(data_name="ghana", model_name="model2")
summary(object=model2)
detach(name=ghana, unload=TRUE)
rm(list=c("model2", "ghana"))

Now I adapt it for the new command:

# Code generated by the dialog, Modelling

ghana <- data_book$get_data_frame(data_name="ghana")
attach(what=ghana)
ydc <- extRemes::decluster(rainfall, threshold= 60, r=1, groups=year, na.action=na.exclude)
data_book$add_model(model_name="ydc", model=ydc, data_name="ghana")
data_book$get_models(data_name="ghana", model_name="ydc")
ydc
plot(ydc

detach(name=ghana, unload=TRUE)
rm(list=c("ydc", "ghana"))

I would like the new dialogue to facilitate this process for a variety of commands that I may wish to run.

I need help with the construction of this dialogue, so these are just my initial ideas.
There are 2 (perhaps 3) parts to the dialogue, namely Get Data, Run Function, (optional), then Display and Save Results. 1) The last control is the 5 bottom buttons, plus the Comment. Here the OK is always disabled. The Script is enabled once the dialogue is complete. Or perhaps OK writes to the script window, i.e. they are identical. I quite like that. 2) I assume the command might need access to specific variables, or perhaps a whole data frame, or to an object from the data frame. Perhaps these are buttons at the top, namely Data Frame, Variables, Objects. 3) So perhaps the Get Data part is quite simple? If Data Frame, then the selector just shows the data frames and we select one in the usual way. If Variables, then we get the "usual" selector and a multiple receiver.

  1. In some instances this is all we need. In that case there is sufficient information to: a) get the data - as shown above b) Add a comment line for the function. Perhaps mention # object1 <- package::function(..., na.action=na.exclude) c) Add a further comment to suggest additional actions. # display and save results d) add a version of the detach and rm commands, see above.

This would already be useful! Maybe start there when constructing the dialogue and test this part. Where does this dialogue go? Perhaps in the Tools menu or in the Edit Menu. Perhaps it is called Add Function?

5) There could be a Function part to the dialogue. This may just ask for the package name and function name? We have a dialogue for this already! It is Help > R Packages and Commands. It could also include the second Help button which is only enabled once at least one of these is given. Then it would change the line above to at least specify the package and command name. It could have a control that asks for the object name, that the user supplies, default is object1.

6) Then the part I am less sure about in the dialogue and also the code. The actions here would be useful, even if not exhaustive. This might be a set of check-boxes. a) Display result in object window Uses Summary? b) Display plot - Uses Plot c) Save variable back to data frame - control with name - and even position if possible? d) Save Object

I hope @dannyparsons and @shadrackkibet could critique and help conceptualize the dialogue. I am not assuming they would then need to do the work, though they may add support.

rdstern commented 3 years ago

I now have another (and real) example for this dialogue. It is useful also as part of data wrangling, so we might consider leaving it in the script form. Perhaps we also could usefully have an Open from Library from the script window that gets example scripts! This would be one of them.

This is a STACK exercise given in the data wrangling workshop for Pakistan. Here are 2 data sets to be appended.

data_set_2.zip

They deliberately have errors that make the appending tricky. They include: a) Variable names spelled differently (Center in 1 and Centre in 2) b) Variables of different types - (Character in 1 and numeric in 2) c) Other problems

Now a) doesn't stop the Append, but you don't get what you want. b) Stops the Append working. You can't Append variables of different types (Except possibly Integer and Numeric, presumably because Integer IS numeric!)

It is quite easy to find the problems and correct them. Some people used Excel and some used R-Instat. Nice to show both. So, now the challenge - could this be made easier in R-Instat?

A possible solution. the R-package called janitor has the sort of function we need. Here it is running in the script window - with some extra bits:

# Code generated by the dialog, Modelling

data_set_1 <- data_book$get_data_frame(data_name="data_set_1")
attach(what=data_set_1)
data_set_2 <- data_book$get_data_frame(data_name="data_set_2")
attach(what=data_set_2)
last_model <- janitor::compare_df_cols(data_set_1,data_set_2)

last_model
data_book$add_model(model_name="last_model", model=last_model, data_name="data_set_1")
data_book$get_models(data_name="data_set_1", model_name="last_model")
summary(object=last_model)
detach(name=data_set_1, unload=TRUE)
rm(list=c("last_model", "data_set_1"))

You just need the top half of the code so far. And it is messy, because we don't have the special dialogue proposed here. I therefore used the Model > Fit Model > Fit Model Keyboard dialogue, because you can include an R command within the dialogue.

a) It didn't work, because I need access to 2 data frames and we can't do this from this type of dialogue. So, I copied the opening of the first data frame in the script below.
b) Then the command worked fine c) So I then added the line to display the results: d) Now I can run the code down as far as the line where I added last_model. e) Note that the janitor package isn't currently in R-Instat. I am not sure we need it apart from this command. So perhaps adding the package could also be part of the exercise. (Ideally you shouldn't have to be an administrator to add a new package - you don't have to be in RStudio. But that's a different issue.

Here are the results: image

Very satisfactory. I didn't run the last part. Of course if there had been more files, or more variables, then it would be better to save the results into a new data frame! I am not sure how to do that!

(One way is to copy from the output window and then use Paste into a new data frame. This doesn't work! In Excel I have the same trouble, even in Excel from Paste Special. But in Excel Use text import wizard works fine. That feature could usefully be part of our Paste Special! That's another issue!)

This is quite a good example, because we might have 20 files (in R-Instat) to compare. How would we do that efficiently, either in the script, or in the dialogue?

And we might then want to add a function, possibly to just show the Variables where at least one Variable has an NA? Or we could just note there are problems if the number of variables in this table is different (more than) those in any individual data frame? And how would we transfer the results into a new data frame?

rdstern commented 3 years ago

@volloholic has added a new dimension to the proposed dialogue.

I was assuming it would only be able to write to the script window. So it would write the top lines where it could get data from a data book.

Then there would be space, in the script window, so the new functions/script could be added. Then there would be code to save the results, etc.

David has suggested the dialogue could include a textbox - a sort of mini-script window in the dialogue! So you could type or copy the code there. Then OK could also be enabled and you could run the code!

We have to decide where this dialogue goes. I wonder if it would be appropriate in the Edit menu? That's m ore mainstream than the Tools menu.

rdstern commented 3 years ago

A special case - which might be simple to do, is when there is initial code and the result is a data frame. I wonder if this could be made simple, possibly it is already almost available - in the File > New Data Frame dialogue. That has a command option.

So I want these commands - and they run fine in the script window:

library(janeaustenr)
library(dplyr)
library(stringr)

original_books <- austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, 
                                     regex("^chapter [\\divxlc]",
                                           ignore_case = TRUE)))) %>%
  ungroup()

If I then add the line:

data_book$import_data(data_tables=list(original_books =original_books ))

in the script window, then I get what I want!

Is there a way the File > New Data Frame could accept this sort of thing, or should this be part of the new proposed dialogue?

Patowhiz commented 2 years ago

@rdstern other than your last comment above, I think this issue was fixed by PR #6752 and PR #7028. So it can be closed?

@rdstern in regards to your comment above, when you paste the code below(I've removed the parts that will make the dialog not work because of how it was designed to), the new data frame dialog will accept and produce a new data frame. Basically, the new data frame dialog expects the last statement to result into a new data frame. Thanks.

library(janeaustenr)
library(dplyr)
library(stringr)

austen_books() %>%
  group_by(book) %>%
  mutate(linenumber = row_number(),
         chapter = cumsum(str_detect(text, 
                                     regex("^chapter [\\divxlc]",
                                           ignore_case = TRUE)))) %>%
  ungroup()
lloyddewit commented 1 year ago

@rdstern Can we close this issue (see comment from @Patowhiz above)? thanks