databricks / learning-spark

Example code from Learning Spark book
MIT License
3.89k stars 2.42k forks source link

Example 3-13 TypeError #3

Open nealmcb opened 9 years ago

nealmcb commented 9 years ago

In the process of putting together an IPython Notebook with convenient worked examples from Learning Spark, I found a simple python semantic error.

Example 3-15 says:

print "Input had " + badLinesRDD.count() + " concerning lines"

which results in

TypeError                                 Traceback (most recent call last)
<ipython-input-10-078b22c97d4b> in <module>()
 ----> 1 print "Input had " + badLinesRDD.count() + " concerning lines"
  2 print "Here are 10 examples:"
  3 for line in badLinesRDD.take(10):
  4     print line

TypeError: cannot concatenate 'str' and 'int' objects

It should say something more like this:

print "Input had %d worrisome lines" % (badLinesRDD.count())

I made a gist of an ipython notebook showing the problem and the fix, with simple worked examples, which you can see (and download) here:

http://nbviewer.ipython.org/gist/nealmcb/b6d989a83adddcdd459f

I suggest including such notebooks in future editions.