databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/
Apache License 2.0
1.17k stars 723 forks source link

Chapter 5 - Joins - Incorrect python code #67

Closed skodman closed 3 years ago

skodman commented 3 years ago

On page 148, python code for explaining joins is incorrect. It reads:

# In Python
# Join departure delays data (foo) with airport info
foo.join(
  airports,
  airports.IATA == foo.origin
).select("City", "State", "date", "delay", "distance", "destination").show()

However, the airports airports dataframe does not exist. It should be changed to airportsna, as follows:

# In Python
# Join departure delays data (foo) with airport info
foo.join(
  airportsna,
  airportsna.IATA == foo.origin
).select("City", "State", "date", "delay", "distance", "destination").show()

Then the code works.

brookewenig commented 3 years ago

Thanks for catching this @skodman! The scala code correctly uses airports, but the Python code uses airportsna. We'll fix this in the next pdf release. cc @dennyglee.

If there are no further issues, can we close this?

skodman commented 3 years ago

Yes this can be closed. I am going through the book and will raise any other issues separately.

cheers skodman