databricks / LearningSparkV2

This is the github repo for Learning Spark: Lightning-Fast Data Analytics [2nd Edition]
https://learning.oreilly.com/library/view/learning-spark-2nd/9781492050032/
Apache License 2.0
1.2k stars 728 forks source link

missing source file #71

Closed tiagobalabuch closed 2 years ago

tiagobalabuch commented 2 years ago

I am trying to follow chapter 6 - Scala notebook: 6-2 Dataset API and I couldn't find this file path (mnt/training/dataframes/people-with-header-10m.txt) in the github to use in my environment.

Other datasets are available in databricks-datasets/learning-spark-v2 however this one isn't there.

brookewenig commented 2 years ago

Apologies for this! The dataset is too large to put in git, but the proper path should be: dbfs:/databricks-datasets/learning-spark-v2/people/people-with-header-10m.txt (that is accessible from any Databricks workspace, including Community Edition).

Could you check and see if that path works for you? I'll update the main DBC so others don't hit this issue.

tiagobalabuch commented 2 years ago

Hi @brookewenig, after changing the path, it worked! Thanks

brookewenig commented 2 years ago

Thanks for confirming!