QCBSRworkshops / workshop04

Workshop 4 - Linear models
https://r.qcbs.ca/workshops/r-workshop-04/
Other
9 stars 57 forks source link

Specific issues and improvements for the explanation of models #11

Closed pedrohbraga closed 4 years ago

pedrohbraga commented 4 years ago

In general, there is a need to better explain the general linear models in this presentation. While the objective of the workshops is to help people with prior knowledge of statistics to use R for their research, this workshop has an unbalanced presentation of its content. For instance, it goes back to the basics by recalling concepts of mean, variance and deviation, but it goes pretty fast on linear models, simple regression and other topics.

It would be a good idea for this workshop to contain an introductory step that describes the concept of a model (in general; not only linear ones), recalling what are predictors (explanatory) and response (target) variables, showing what is the idea of $Y = f(X) + \epsilon$, and only then, begin with general linear models.

Whenever explaining general linear models, a type of standard has to be followed, where the equation is shown followed by its assumptions described in prose and, whenever possible, with the corresponding mathematical notation, as in:

We now define what we will call the simple linear regression model,
$$Y_i = \beta_0 + \beta_1 x_i + \epsilon_i$$
where, 
$$\epsilon_i \sim N(0, \sigma^2)$$

Meaning that:
1. The relationship between $Y$ and $x$ is linear, of the form $\beta_0 + \beta_1 x_i$;
2. The errors $ϵ$ are independent.
3. The errors, $ϵ$ are normally distributed. That is the “error” around the line follows a normal distribution.
4. At each value of $x$, the variance of $Y$ is the same, $σ^2$.

Following this, complexity can then be built by including more explanatory variables (for multiple linear regressions) and changing the variable types.

Recommended changes:

dschoenig commented 4 years ago

Commit 2062207bf44770a9fb87cfa9e0efef9fab73467a addresses these issues for the English and French version in an overhaul of the section on linear regression:

The section on multiple linear regression deserves an issue of its own. I only changed the mathematical presentation (now consistent with the simple linear regression), and I added a note at the end, that the parameters of the model should not even be interpreted since it clearly violates the assumptions.

I decided not to use the term "General linear model" as it is generally used in the context of multivariate methods, while the literature refers to the models used in this workshop simply as "linear model".