IDEMSInternational / R-Instat

A statistics software package powered by R
http://r-instat.org/
GNU General Public License v3.0
38 stars 103 forks source link

The 2-variable fit model dialogue #5560

Open rdstern opened 5 years ago

rdstern commented 5 years ago

We are using the general modelling dialogue with the AIMS students. For other groups it would be more useful if the sequence of dialogues from 2 variables upwards were also improved.

There is a nasty bug with the 2-variable dialogue and (perhaps) simple improvements that might be easy to make. I describe the problems to explain some statistics at the same time.

The dataset I am using (mainly) is the survey from the Open from Library Instat Introductory Guide Datasets
First the nasty bug I found. Just go into the dialogue and press the Function button. It is unhappy if there is nothing in the x-field to take the function of. Easy to fix by disabling until the x is completed. I would be happy with that fix, but would slightly prefer to be able to look at the options first if I wanted to do so.

I notice, when you press the function button it gives the function preview. You can actually type there. I would quite like that. So I did. I typed poly(fert,2). It ignored my typing when I returned to the main dialogue. Should we allow it.

This is a more general point. When I have fert in the x variable, then currently you can't type there (and that is intentional. I can understand the reason. It is dangerous. But could there be a right-click on the receiver which would then allow typing into a single receiver?

I am copying @dannyparsons now in this message, because that would be a more general feature - and dangerous - but danger is my middle name! Most of the changes I suggest in this issue are in addition to that. Perhaps it should be a separate issue?

If we do allow this, then I would also like to consider the Try field that we have on some dialogues. It might even only be visible if that feature is enabled. Though I don't see a problem with having it anyway. (And I am separately suggesting we have it anyway on the Model > General > Fit Model dialogue where you can type.)

Back to my clear suggestions for this issue. The option I wanted is currently disabled, namely the poly function to fit polynomial models. That's examples like poly(x,2) etc. Please can that be enabled. It is currently called Power, but disabled. Perhaps it could be called Polynomial instead?

Notice, if we can allow typing then it also does make sense to have x as fert + poly(fert,2) as the model. That shows then the additional value of the quadratic, over a straight line.

It is all still just using 2 columns. So all consistent with the idea of teaching that "simple" models - like summaries can use just 2 variables.

There is more - and important stuff to add - on this modelling. We could use spline functions and useful to discuss those with a maths group (i.e. AIMS).

So the next improvement could be to add the splines to the 2-variable fitting. This is either (initially?) by being able to type or by adding the option Spline to the Function button.

I am slightly confused by how to use splines in R. There is smooth.spline in the stats package (which I had a bit of trouble with) and there is the splines package that is now (I understand) distributed with base R. So the model splines::bs(fert) works. You can also give degrees of freedom with it, so putting splines::bs(fert,4) works too (though putting a small number doesn't work as I would have thought. However I think all my issues with fitting splines are answered here.

This is great, for now.

For reference for our climatic work in the future the splines package also has periodic splines. See the documentation here.

There is also another set of problems (loosely linked to splines, but I don't think that helps much?). They are sometimes called broken-stick methods. The simplest is where you have 2 straight lines that join, but have different slopes. I need that occasionally with trends in climatic data where yuo might fit a single line, but also possibly a horizontal line to (say) 1980 and then a sloping line from then on. This article is good and clear from first principles. More generally there is a package called segmented that I suggest would be useful to add.

rdstern commented 5 years ago

So here are some particular edits suggested for the 2-variable fit-model dialogue. They would be used in the forthcoming AIMS statistical climatology course. a) The Function button is only active once the x-variable is not empty. b) Add a try field, as there is on many other dialogues (and requested on the general fit model dialogue) c) Add a checkbox beneath the Function button "With second function". (There doesn't need to be another receiver, because it is based on the same x-variable.
d) Add another copy of the Function button below the checkbox. It becomes available if the checkbox is checked. e) Expand the function sub-dialogue. The first 4 options could be in 2 columns, i.e. o Identity o Natural log o Square Root o Log Base 10 o Power Follow by a numeric field (not an up/down), so could have decimals and possibly -1 etc. (Could we allow 1/3?) o Spline d.f. followed by an up/down, from 0 upwards o "Broken Stick" "at" then text field to type a number for the position of the break. o Own with a text field into which a function can be typed.

Also leave some space at the bottom, because we will add some time-series functions later.

The second function button opens the same sub-dialogue as above. The resulting function is made into the (first function) + (the second one).

rdstern commented 5 years ago

And another small thing to change. The 2-variable fit model selector does not allow a date variable to be selected. The other instances (3 variable and general) do allow a date variable. Allow date variables here too.

I have been exploring the formula needed to be generated for a simple "broken stick" model - where you know the break point. With a given x-variable (x) and break point a number b it is the formula:

x + I((x-b)*x>b).

This is becoming interesting in the use of the model formula. You need the I( ) here so tht what is inside the brackets is treated as ordinary and not a model formula. I have a puzzle for @dannyparsons in that I couldn't adapt this formula when x is a date. Instead ifelse(x>1/1/1970,x-1/1/1970,0) does work, but I assume there should be a way to include dates in the simpler expression too?

rdstern commented 5 years ago

Looking ahead to when the x-variable is a date here is what is now possible (typing) in the General > Fit Model dialogue:

as.factor(lubridate::month(date)) + lubridate::year(date)+dplyr::lag(tmin)

This is mainly using the x-variable (date) with functions and has also added a lag of the y variable - first order auto-regression.
This could be accommodated for, where the x is a date variable, by having Month and Year as possible functions. There could be a checkbox after each of these to make them into a factor. (With daily data even year could be a factor.

rdstern commented 4 years ago

This is a larger piece of work! Hope it will be fun?

Ivanluv commented 4 years ago

@rdstern should the Response and Explanatory variable selectors both allow date varaiable to be selected?

rdstern commented 4 years ago

Yes - I guess so!

Ivanluv commented 4 years ago

The second function button opens the same sub-dialogue as above. The resulting function is made into the (first function) + (the second one). @dannyparsons could you help me with suggestions on how to do this