IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 102 forks source link

Adding to the model dialogue 3 - a predict dialogue #5127

Open rdstern opened 5 years ago

rdstern commented 5 years ago

I suggest this is important. I indicate this through showing the "system" in Genstat. I think we can do even better! Here is the dialogue for fitting a model. This is like our Model > General > Fit or Model > Model

image

Notice that Predict is not enabled. I now run the model. This is like our Fit and this enables the Predict button. So I press Predict to give the following sub-dialogue:

image

In the above dialogue I changed the levels of the fertiliser to 0,1,2,3 and then it gave the following results in the output window: image

I had asked, so it also saved the (same) results in 2 data frames. It has a table structure for data frames, so it saved them separately. Here is one of them:

image

I could also do a graph, and when I click on the options in Predict I get as follows: image This gives the following graph: image

Note that in Genstat you have access to all the columns in the data frame. So what happens if you a) Don't use all the variables in the model. Here is the result ignoring the variate:

image

And here ignoring that factor: image

These are like the margins in the 2-way table given above. Those were "Marginal weights", i.e. using the observed frequencies of the 3 varieties. image

These are equal weights, i.e. weighting each variety equally.

b) There is nothing in Genstat to stop you using predict with variables that are not in the model. However, then it just gives a warning that you have used terms that are not in the model, and doesn't give you anything. We can do better, because we could just include the terms that are in the model - from the corresponding object.

I claim this is all very important for us in R-Instat. It sort of "completes" the model component.

In "simple (descriptive) statistics" we use the describe menu to prepare informative tables and graphs. They are direct summaries of the data.

Then we move to modelling (statistical inference). We fit models and the "recent" advances are on the range of models that we can fit with a common framework.

Once we have a suitable model we then (should) want to prepare the corresponding tables and graphs using the chosen model. That's statistics! And (see above) the predict feature is the way we do this. So in an improved (more advanced?) version of our statistical problem-solving course we could include this idea!

We have some decisions to make for R-Instat. I suggest we use the prediction package where possible. Here are some possibilities - not mutually exclusive:

  1. We have a separate dialogue for the predictions. I think this will be useful. That's easy for us (and not for Genstat), because we always save the model object So the prediction dialogue can use the saved models.
  2. We may sometimes wish to (also) be able to display the predictions at the same time as we fit the model. That's what Genstat does. In our case it could be another tab on the Display sub-dialogue. This could call the same prediction dialogue.
  3. We will often want to produce the predictions into a new data frame - see the saving from Genstat above. Unlike Genstat this will be an ordinary data frame, so the example above would have 12 rows (4 for the levels of Fert, by the 3 varieties. That means that the predictions are in a single column and the standard errors, etc can be in further columns.
  4. We will usually also want to produce some output in the output window. There it is probably appropriate to produce tables - as is done by Genstat.
  5. I assume we will usually get the Prediction dialogue to produce the new data frame. An alternative is to use an existing data frame. We might even choose to have a predict tab on our File > New Data Frame dialogue. So we could then access an existing data frame to get predictions. This will also be appropriate if (as above) we want to try alternative options in the predictions. They would then simply become extra columns in an existing data frame. As with other summaries the prediction data frames would (by default) be linked to the data frame with the data.

The structure of the prediction data frame will need careful discussion and planning.

shadrackkibet commented 5 years ago

I like this ideas, I was trying to obtain prediction intervals using R-Instat but it was not possible(maybe). This is for modelling course at AIMS . I did the same thing in R and i was able to obtain my results. predict(fit1,data.frame(Gestation = 27),interval = 'predict'). The results i was trying to replicate are on page 17 of this lecture notes 05_Saturday_Hypothesis_Testing.pdf and attached is the dataset. protein.txt

rdstern commented 5 years ago

And there is a package called prediction, that claims to make this all easier. Can you get the prediction intervals (i.e. not the usual confidence intervals) just as easily with the prediction package?

rdstern commented 5 years ago

It is exciting that the Prediction keyboard is there in the new Use Model dialogue. I have been trying it, and it seems very close! I tried following the example in the prediction manual. a) It starts with datasets and the iris data, which we have. So I opened that. b) I used the Model dialogue to fit the same model as in the guide, i.e. lm(Petal.Width ~ Sepal.Length Sepal.Width Species) saved into model1 c) I then used prediction(model1) and it gives an error. d) So I used lm(Petal.Width ~ Sepal.Length Sepal.Width Species, data = iris) (as in the guide), so I explicitely included the data = iris in the lm command. Saved in model2 e) Now prediction(model2) does not give an error - but it doesn't give any output either. I am not sure what I expected - I think the predictions in the output window for every data point. (These are the fitted values) f) Then prediction(x, iris[1,]). Runs fine, but also no output. g) Then prediction(x, at = list(Species = c("setosa", "virginica"))) and it does give output. Whoopee! h) Then prediction(x, at = lapply(iris, mean_or_mode)). This needs predition:: in front of the mean_or_mode function, but also works and with output.

Then onto the next example - with a new package called mlogit. The package installs and the data set is then available. Brilliant! Then the commands work to fit the model.

But then the line with prediction says:


Error running R command(s)

Error in tmp[["fit"]] : subscript out of bounds

The error occurred in attempting to run the following R command(s):

.temp_val <- capture.output(prediction::prediction(mod)) OK

This all seems very close - and very exciting to be able to get this far in R-Instat. I hope this might work even better in time for Rwanda.

Could someone in AMI also add the memory of models used in the past - as in the model dialogue, and possibly even a Try field?

rdstern commented 4 years ago

@dannyparsons or @maxwellfundi the prediction package is used by the Model > Use Model dialogue, but I think is not yet included in the set. Please could it be.

rdstern commented 4 years ago

@Ivanluv and @dannyparsons please could the Model > Use Model dialogue be enhanced. For @Ivanluv please could you repeat the "tricks" you did on the Model > Hypothesis Tests dialogue. This is to add the same features you did on that dialogue, namely: a) Add the show arguments checkbox b) Add the try control c) Add the same Help button as the others. It shows the whole package and that is very useful. d) Add the facility in the Expression control to remember the previous expressions. e) Add a Save Result checkbox, though I am not sure if that will be simple?

@dannyparsons what can be added to save the results? For example, I have been trying the dialogue ready for extremes work. The erlevd function returns a vector. I assume we can save them all as simply another object?

rdstern commented 4 years ago

I have been fitting an extreme value distribution, using the Model > Modelling dialogue. So last_model is the fitted values. Can try with a sample dataset from the extRemes package, e.g. flood. Once fitted, there are 3 methods described, namely plot print summary

print and summary are both available on the Model > Use Model dialogue wiuth the general keyboard. When I try plot it doesn't give an error and the first time it quickly seems to give the plot, but it then disappears. So it isn't getting to the output window. Can this be done?

I wondered whether it might appear if it was a ggplot, so I tried ggplotify.

I found that ggplotify::base2grob(~plot(last_model)) works - or at least doesn't give an error. Then: ggplotify::as.ggplot(ggplotify::base2grob(~plot(last_model)))

perhaps gives a ggplot, but I still don't get a plot.

dannyparsons commented 4 years ago

Let's separate out some of these to separate issues.

rdstern commented 4 years ago

I am re-opening this issue briefly. It is great it is merged, but please could @maxwellfundi or @Ivanluv quickly change the size of the dialogue. a) It is now much wider than need be. b) It is also a bit "shorter" so the bottom line of buttons are only half-visible.