canadalearningcode / llc-intro-to-ruby

Intro to Data Analysis with Ruby -- (SLIDES - http://ladieslearningcode.github.io/llc-intro-to-ruby/slides.html), (LEARNER FILES - http://bit.ly/llc-ruby-data) - No sample project.
Other
7 stars 10 forks source link

Choosing iteration style and introducing arrays when reading CSV #9

Open jleben opened 6 years ago

jleben commented 6 years ago

This issue is an invitation for discussion. I started creating a pull request but realized it is not obvious how the issues described here should be solved. I would like to get more feedback.

The slide 58 says "We can use for loops on CSV files", but a for loop is not seen anywhere in the slide.

The slide 59 uses CSV.foreach which also introduces the new block syntax. All this is redundant and an overload for the learners. I think it would be better to just use f = CSV.open(...); for line in f ..., since this kind of iteration was already presented earlier when using File.

Alternatively, since the slide 58 uses csv_file.read(), that could be followed by data = csv_file.read(); for line in data ...

Regardless of what type of iteration is used, I think the slides should spend a bit more words on Arrays, to clarify how line in for line in f is different when f = File.open vs when f = CSV.open.

I would like to hear more opinions on what type of iteration should be presented when using CSV, as well as thoughts on how to address Arrays.

jleben commented 6 years ago

@jessynd @rbnhmll @eddieantonio I am pinging you on this issue, because I haven't received a reply yet, and you are the last three committers to this repository.

eddieantonio commented 6 years ago

Hey Jakob,

Opinion time!

I argue the slides should exclusively teach iterators with block methods (#each, #foreach, #each_with_index, etc.). Why?

  1. for loops are discouraged by experienced Ruby programmers. For example, it's a rule in RuboCop [source]. Anecdotally, when I was solving practice problems on exercism.io, the feedback that learners get is to use iterators instead.
  2. It makes the rest of the slides less confusing! Because the existing solutions use #foreach and similar methods. So, in an effort to reduce the cognitive burden on the learners, I think it is prudent to teach only one style of iteration—the most idiomatic and widespread in Ruby.

Could you clarify what you mean by:

Regardless of what type of iteration is used, I think the slides should spend a bit more words on Arrays, to clarify how line in for line in f is different when f = File.open vs when f = CSV.open.

Edit: I looked at the Ruby docs and the slides, and, again, I suggest using File#each instead of a for loop.

jleben commented 6 years ago

Hi Eddie,

Thank you for your response!

I am definitely in favor of using a single style of iteration across all of the slides.

Regarding which style to choose though: I agree there is merit in choosing #foreach if that is the most idiomatic style in Ruby code out there in the world. From the perspective of teaching coding in general, as opposed to teaching Ruby, I feel sorry though that the built-in for loop is not the most idiomatic one. I have several reasons for that:

  1. In other most popular languages, using for x in collection is in fact the most idomatic style. Afaik, I would argue that Ruby is rather isolated here (it is closer to functional languages though...)
  2. Using #foreach introduces additional burden in itself: the burden of grasping the special syntax of blocks.
  3. The built-in 'for' is - well - built in. I would guess that if one was to implement a method like CSV#foreach, one would probably use a built-in for or while in the implementation.
  4. I would guess that one place where the built-in for is most idiomatically used is for i in 0..5. Would you say so?

I am not a Ruby programmer myself, so I can hardly estimate how distasteful using a built-in for loop feels. I regret though that Ruby stands so apart from other languages, and is arguably even confusing on its own. (One of the basic built-in facilities is discouraged in practice?)

jleben commented 6 years ago

To clarify what I meant by the difference between File.open and CSV.open: iterating the former gives you each line as a String while iterating the latter gives you each line as an Array of Strings.

jleben commented 6 years ago

@eddieantonio I just checked your link to Ruby style guide, and I realized why for is considered bad - because x in for x in something is defined in the scope outside for. That is certainly disturbing to my taste!

However, I have just tested Python and JavaScript and they both do the same as Ruby! Again, I wasn't aware of that.

And yet, I have never heard the for loop being discouraged for that reason in JavaScript or Python.

What do you think?

nathany commented 6 years ago

@jleben Thanks for starting the discussion. These slides are always a bit head-scratchy for students due to skipping over arrays entirely, and the iteration syntaxes.

It does depend a little on what the objective of the course is. If the point is to teach computer programming, and it happens to be using Ruby, then the for syntax is more common and it's probably easier to grok than blocks.

If the point is really to teach Ruby, then blocks are a stand-out feature, and they should have a few slides dedicated to it. Though that might also be the case for "everything is an object" -- and classes aren't really taught because it's already a very dense day.

Given how much there is to learn in a day, I'd likely opt for adding a slide or two on arrays, leaving out blocks, and simplifying the iteration.

eddieantonio commented 6 years ago

@jleben Ah, the contentious for vs. iterator debate!

I am not a Ruby programmer myself, so I can hardly estimate how distasteful using a built-in for loop feels. I regret though that Ruby stands so apart from other languages, and is arguably even confusing on its own. (One of the basic built-in facilities is discouraged in practice?)

[...]

However, I have just tested Python and JavaScript and they both do the same as Ruby! Again, I wasn't aware of that.

And yet, I have never heard the for loop being discouraged for that reason in JavaScript or Python.

What do you think?

If LLC started offering the "Introduction to Erlang" course (fat-chance, but stick with me), it would strike me as inappropriate to teach Erlang's built-in if syntax. Yes, if statements! The reason is because, in Erlang, pattern-matching and functions are the bread-and-butter of the language; if expressions—despite being built-in—are not as widely used simply because the language has different, more powerful mechanisms built-in to it, such as pattern matching, guards, and the case ... of syntax (also, there aren't really booleans in Erlang—just symbols for false and true...).

Nathan brings up an excellent point:

It does depend a little on what the objective of the course is. If the point is to teach computer programming, and it happens to be using Ruby, then the for syntax is more common and it's probably easier to grok than blocks.

If the point is really to teach Ruby, then blocks are a stand-out feature, and they should have a few slides dedicated to it. Though that might also be the case for "everything is an object" -- and classes aren't really taught because it's already a very dense day.

I'd like to hear @jessynd's opinion on this. My argument for why iterators and blocks should still be taught instead of the humble for loop is because we really don't have to go into very much detail about what object-oriented programming is in a six-hour workshop; we can just say "this is how we loop over a collection in Ruby", and the more inexperienced learners will simply accept it. Only experienced learners that are familiar with more than one programming language will question why they aren't learning about for loops ;)


I would guess that one place where the built-in for is most idiomatically used is for i in 0..5. Would you say so?

I'm a mining software repositories researcher by training, so let's look at the data! Running some coarse GitHub queries:

for i in 1..10 nets 6,032,692 results

(1..10).each do |i| nets 8,463,129 results.

10.times do holds its own with 2,041,766 results.

Additionally, if we flip through the search results for the for syntax, a LOT of them are clones of rocky/linecache/test/data/for1.rb. Meanwhile, there are much more varied results if you flip through the results for the .each do syntax.

jleben commented 6 years ago

As I have expressed earlier, I agree with both @eddieantonio and @nathany that this issue depends somewhat on the general perspective: teaching coding in general versus teaching Ruby specifically.

Research on cognitive load in introductory computer science education [1] suggests that it is easier for learners to acquire syntactical templates and apply them using slight modifications, rather than to grasp the individual meaning of smallest syntactical elements and compose them.

Based on this, I would expect that the two alternative templates (for and .each) have little difference in intrinsic cognitive load. A complete beginner will happily accept either one and use it successfully (as a template).

Still, from the point of view of cognitive load [1], the choice may be significant in a broader learning context:

  1. There is a limited number of templates a learner can keep in the working memory without storing them into the long-term memory (which in turn requires a lot of practice). The conclusion here is that reducing the total number of templates is beneficial. This supports what I believe we all agree: sticking to one iteration style throughout the slides would be better.
  2. Cognitive load is reduced when multiple low-level templates are grouped into higher level schemas (generalization). A higher number of templates can be retained and worked with when they have more in common. I think it could be argued that the syntax and semantics of for have more overlap with if than the syntax and semantics of blocks do. If so, I would conclude that a learner may have less trouble generalizing from if and for than from if and .each and hence successfully acquire and work with both.
  3. Generalization (abstract schema formation) is essential to writing code efficiently. Yet, it does not occur in a learner instinctively - it requires a conscious effort by the learner and therefore likely encouragement by the teacher. Even though a learner may successfully apply an iteration template in the course of a workshop, they will gain more long-term knowledge if they can generalize. Whichever style of iteration is chosen for the Ruby workshop, I think the instructor (and the slides) should place an emphasis on what's general about iteration, regardless of which programming language is used.

[1] Morrison, B. B. (2016). Replicating experiments from educational psychology to develop insights into computing education: cognitive load as a significant problem in learning programming (Doctoral dissertation, Georgia Institute of Technology).

jleben commented 6 years ago

@eddieantonio Thanks for posting the GitHub search results! Evidence is always welcome ;)

I would conclude from those search results though that for and .each are just about equally idiomatic - at least when iterating a range of integers, if not more broadly. I would soften your claim that a LOT of the for results are clones of the same file. I only see that file on 12 out of a 100 total pages of results. Moreover, some results for .each are actually not for iterating an integer range, but a different type of items or collection. Finally, even if this comparison is considered fair, .each has only 1.4 times the results for for.

jleben commented 6 years ago

I see two more lines of reasoning that I would like to bring up for debate.

The first one is: If the Ruby land is really split between using for and using .each, then maybe it's only fair that the workshop introduces both styles, like the slides already do?

The second one starts from an entirely alternative perspective: Since the CLC workshops are exclusively introductory (to the best of my knowledge), perhaps the goal is simply to make the learners feel like programming is fun and that they can do it, and not to maximize the utility of what they learn for understanding code that's out there and for writing code that's most acceptable. From this perspective, we should pick whatever iteration style makes the least puzzling experience.

jleben commented 6 years ago

Here's an additional thought: if the goal is just to have fun programming and to generate a (however temporary) feeling of empowerment, then I would even question the utility of teaching both Python and Ruby. I think in that case it would make sense to pick a single language that offers the most fun initial experience and only offer a type of content with that language. (Note that this Ruby workshop is a clone of an earlier Python workshop, and the two only differ in the programming language).

ChaoticBoredom commented 5 years ago

At the risk of resurrecting a zombie issue, I'm strongly of the opinion that we should be teaching Ruby syntax in the Intro to Ruby course. While it is true that there are strong similarities between the Python and Ruby courses, the fact remains that they are different languages. Having worked w/ Ruby for the past 4 years, (albeit primarily with Rails) I don't think I've ever used a for loop over .each in Ruby... maybe, but I'd have to go digging in memory lane. I do believe that learners will grasp whatever is taught, and if they are starting to explore multiple languages, they're starting to wander into more advanced territory anyways.

jleben commented 5 years ago

@ChaoticBoredom While the discussion above, including the GitHub code searches, does not seem to be conclusive as to what kind of looping is universally preferred in Ruby, I get a feeling that at least people who code in Ruby a lot seem to prefer .each.

I think we have a much stronger consensus though that a single iteration style should be used consistently across the slides, as opposed to trying to teach both. I have experienced first hand on this Ruby workshop the pain and confusion of learners trying to first grasp one syntax and then another.

In PR #14, the slides and exercises have been changed to consistently use for. I think that's an improvement over the previous state. I have no personal strong preference about what iteration style should be used, but I do like to operate based on facts. So if you feel like changing everything to consistently use .each and similar functions, I am very open to this, but I feel like that would deserve more discussion and data. @kasslent Probably CLC should organize an Advisory Council to talk this out.

robinetmiller commented 5 years ago

Here's my three-part commentary. Skip to the Conclusion for a tldr;

Part A: Better Github Search Counts

I took a look at @eddieantonio's quick and dirty searches, plus the Github search capabilities, and noticed a couple things.

  1. The result numbers aren't constant between page refreshes
  2. Hovering over over the (?)note at the end of the header, it states that they are truncated due to long query time

So those earlier data points aren't showing us the whole picture.

Here's a slightly tighter version of the same queries. It limits to file extension .rb, removes non-supported characters, and applies quotes to encourage the keywords' closeness (and still finds real results). I also refreshed the page a few (unscientific) times to get a feel of the result ranges.

for

#each

As an aside, the #each form is going to be an undercount because the in-line syntax uses { instead of do, so they'll be invisible to github search.

# full block
my_array.each do |item|
   puts item
end

# compact block
my_array.each { |item| puts item }

Part B: My Take

Some context: I've used Ruby full-time for 5+ years, mentored this Ruby seminar a few times, and now run the seminar as head instructor this past weekend.

I agree wholeheartedly with @ChaoticBoredom and @eddieantonio that the #each form is far more common in production Ruby and should be the form taught in this seminar. Here's why:

  1. It's better Ruby, based on:
    1. the Ruby professionals in this thread offering personal experience and the many StackOverflow responses offering the similar take
    2. Rubocop, the defacto Ruby style guide, enforces it
    3. The revised GitHub search numbers above
  2. The potential bugs of using for are eliminated
  3. It provides easier learner templating between the Enumerable methods #each, #collect, #select, #reject, and #reduce (if they continue their Ruby journey, or if the slide deck is changed to include some of them)
  4. It provides another instance of the concept of methods, whereas for does not

I don't really think that the syntax similarity argument breaks hard either way. Python, for example, has:

# Python
for name in user_names:
   # do stuff

But Javascript (and TypeScript, by extension) supports:

// Javascript
userNames.forEach(function(name){
   // do stuff
});

Which is extremely similar to Ruby's #each (and I'll note, JS is top of the 2018 GitHub popular languages list, if that's a consideration).

Part C: Summary

PR #14 from @jleben is a great contribution and moves in the correct direction - when instructing, I absolutely appreciated how much more consistent it is now.

I also think there's strong evidence that replacing for with #each would be another step forward, which is itself easier because of the consistency in #14.

I'm willing to do that work, because there are some more similar changes that could also simplify learners' mental models and smooth out the concepts curve, so I've already started a fork for it.

eddieantonio commented 5 years ago

@robinetmiller great comment! Thanks for tightening up my queries. To clarify, you mean completely replace for with #each, correct? That way there is one, consistent iteration syntax taught in the slides?

robinetmiller commented 5 years ago

Yup, exactly that. One way to iterate, using only #each and the full block syntax.