brad-cannell / r4epi

Repository for the R for Epidemiology book
http://www.r4epi.com/
Other
18 stars 50 forks source link

Review and improve the introduction to regression chapter #108

Open mbcann01 opened 10 months ago

mbcann01 commented 10 months ago

Overview

In the Fall of 2023, I moved over a bunch of stuff from PowerPoint slides (nearly) verbatim. I was in a rush, so I told myself to move it just move it over and improve it later.

Go back, reread, and improve. PowerPoint doesn't always translate perfectly to book format.

Left off at

2023-09-29: Complete first draft of the chapter. I broke the intro to the regression chapter up into multiple chapters -- one for each type of GLM. I started working on those.

Tasks

mbcann01 commented 10 months ago

Models

1.1 Models and their Purposes Many of the toys you played with as a child are models: dolls, balsa-wood airplanes with wind-up propellers, wooden blocks, model trains. But so are many serious objects of the adult world: architectural plans, bank statements, train schedules, the results of medical diagnostic tests, the signals transmitted by a telephone, the equations of physics, the genetic sequences used by biologists. There are too many to list. What all models have in common is this: A model is a representation for a particular purpose. A model might be a physical object or it might be an idea, but it always stands for something else: it's a representation. Dolls stand for babies and animals, architectural plans stand for buildings and bridges, a white blood-cell count stands for the function of the immune system. When you create a model, you have (or ought to have) a purpose in mind. Toys are created for the entertainment and (sometimes) edification of children. The various kinds of toys – dolls, blocks, model airplanes and trains – have a form that serves this purpose. Unlike the things they represent, the toy versions are small, safe, and inexpensive. Models always leave things out and get some things – many things – wrong. Architectural plans are not houses; you can't live in them. But they are easy to transport, copy, and modify. That's the point. Telephone signals – unlike the physical sound waves that they represent – can be transported over long distances and even stored. A train schedule tells you something important but it obviously doesn't reproduce every aspect of the trains it describes; it doesn't carry passengers. Statistical models revolve around data. But even so, they are first and foremost models. They are created for a purpose. The intended use of a model should shape the appropriate form of the model and determines the sorts of data that can properly be used to build the model. There are three main uses for statistical models. They are closely related, but distinct enough to be worth enumerating.

  1. Description. Sometimes you want to describe the range or typical values of a quantity. For example, what's a "normal" white blood cell count? Sometimes you want to describe the relationship between things. Example: What's the relationship between the price of gasoline and consumption by automobiles?  

  2. Classification or prediction. You often have information about some observable traits, qualities, or attributes of a system you observe and want to draw conclusions about other things that you can't directly observe. For instance, you know a patient's white blood-cell count and other laboratory measurements and want to diagnose the patient's illness.

  3. Anticipating the consequences of interventions. Here, you intend to do something: you are not merely an observer but an active participant in the system. For example, people involved in setting or debating public policy have to deal with questions like these: To what extent will increasing the tax on gasoline reduce consumption? To what extent will paying teachers more increase student performance? The appropriate form of a model depends on the purpose. For example, a model that diagnoses a patient as ill based on an observation of a high number of white blood cells can be sensible and useful. But that same model could give absurd predictions about intervention: Do you really think that lowering the white blood cell count by bleeding a patient will make the patient better? To anticipate correctly the effects of an intervention you need to get the direction of cause and effect correct in your models. But for a model used for classification or prediction, it may be unnecessary to represent causation correctly. Instead, other issues, e.g., the reliability of data, can be the most important. One of the thorniest issues in statistical modeling – with tremendous consequences for science, medicine, government, and commerce – is how you can legitimately draw conclusions about interventions from

Kaplan, Daniel. Statistical Modeling: A Fresh Approach (Project MOSAIC Books) (pp. 14-15). Project MOSAIC Books. Kindle Edition.

Kaplan, Daniel. Statistical Modeling: A Fresh Approach (Project MOSAIC Books) (pp. 13-14). Project MOSAIC Books. Kindle Edition.

Kaplan DT. Statistical Modeling: A Fresh Approach. Second. Project MOSAIC Books; 2017.