beanumber / oiLabs-mosaic

Source files for OpenIntro Statistics labs
Other
1 stars 8 forks source link

Replace all for loops with do() loops #1

Closed beanumber closed 10 years ago

beanumber commented 10 years ago

Most of these conversions should be straightforward, but a few of them may be tricky.

beanumber commented 10 years ago

For example, Lab 4A contains three for() loops. Are these necessary? Or can they be replaced with do() loops?

Lab 4B and Lab 6 also contain a for() loops.

galenlong commented 10 years ago

In regards to introducing do loops in lab 4A:

I think we should stick with either for loops or do loops; too many options will just confuse the student. Since we're doing a mosaic version of the labs, do loops are probably the way to go.

For loops are nice because everything is laid out for you. A do loop packs a lot of code into one statement. First off, students tend to struggle with function chaining and you can't break things up into separate variables in a do loop. And secondly, the idea that the do loop is smart enough to put each thing into the next slot in a data frame threw me off (because with a for, you have to specify the index with a counter. But where's the do's counter?).

It might be good to restructure the introduction to do loops like so:

  1. Show the do loop in action.
  2. In the section below (currently titled "Interlude: The for loop"), introduce the problem of iteration by showing how tedious it would be to type 5000 lines of sampling.
  3. Explain how, unrolled, a do loop is exactly equivalent to typing it out 5000 times.
  4. Explicitly talk about the do loop putting each result into each slot of the data frame. ("The do loop is smart enough to put each thing in the next empty slot", or whatever). (I'm not totally sure about this because maybe no one else was confused like I was, since no one else came in with preconceptions about how a loop is supposed to work?)
  5. Show the results of the do loop with head(sample_means50) so students can see the results for themselves.
  6. Change the exercise below to be something like "change the do loop to only take 100 sample means". 7. Maybe add another exercise that says "Change the code to take 5000 samples of size 100 instead" (which they do below anyway).

A more specific suggestion:

4A explains how the for loop looks like "unrolled" with this code:

samp <- sample(area, 50)
sample_means50[1] <- mean(samp)
# ...repeat this 5000 times

Because this code isn't in a do, it has the luxury of being able to split taking the sample and putting it into the data frame into two lines. However, it doesn't look as similar to the do loop. I'd recommend changing the "unrolled" code to this:

sample_means50[1] <- mean(sample(area, 50)) # consolidated
# ...repeat this 5000 times

I think that makes the jump to the do loop more clear.

rpruim commented 10 years ago

I feel like the comments above are trying for force do() to be a for loop instead of appreciating do() for what it is. When I teach with do(), I typically use an outline like this:

  1. do it once for my data
  2. do it once for a randomized version of my data (perhaps repeat it a time or two to see that it changes).
  3. do 2 a lot of times and save the results.

I do agree with using head() to take a look.

> mean(~age, data=HELPrct)
[1] 35.65342
> mean(~age, data=resample(HELPrct))
[1] 35.93377
> mean(~age, data=resample(HELPrct))
[1] 35.9404
> Bootstrap <- do(1000) * mean(~age, data=resample(HELPrct))
> head(Bootstrap,3)
    result
1 35.55188
2 35.96026
3 35.85872

If you want to see many more examples, see Lock5 with R

Another way of doing this (which has advantages in some settings) is to use do() with a smaller number of iterations first. For example:

> do(3) * lm( age ~ shuffle(sex), data=HELPrct)
  Intercept    sexmale    sigma   r.squared
1  35.19626  0.5985360 7.714603 0.001089595
2  36.13084 -0.6250608 7.714222 0.001188308
3  36.12150 -0.6128248 7.714400 0.001142240

PS. My students are never confused about each row corresponding to an iteration since that's what I tell them do() does.

beanumber commented 10 years ago

I agree with Randy -- but one question for us to ponder is how far from the "official" OI labs do we want to allow these to drift? This particular lab may be the most difficult, since the fundamental approach to iteration is different.

rpruim commented 10 years ago

I'm not in close communication with the OI folks, and my opinion might change if I were. But I would say make that labs as good as they can be and not fetter them unnecessarily. If there are things in the labs that don't make sense to cover when you use mosaic, let them go. If there are other things that could or should be added, add them in. Students are not going to see both sets of labs, so they won't be distracted by any differences.

beanumber commented 10 years ago

Agreed. This lab may be one that is ripe for outright replacement.

In any case, the latest commits have removed all for() loops, so I am closing this issue.