We can show (e.g. in Task 8) all the transformation steps initially do nothing and return immediately. The actual computation only happens when count, show etc. are called.
Show one use case of the Spark UI
Ask the participants to navigate through the Spark UI when some transformation job is running (e.g. Task 8), count the number of stages, what each one does, and how long.
Let me know your thoughts on the above. I can also try to add them to the notebook if we agree. In general, I think the amount of exercises will most likely exceed the workshop duration, but they're well organized with an increase of difficulty, so I think we can set the expectation that one should try to solve them in the defined order, and as many as one can and the time permits.
I like the general flow of the exercise very much, and have a couple of additional suggestions.
Introduce partitions
We can show how to check and change the number of partitions, e.g.
Demonstrate lazy evaluation
We can show (e.g. in Task 8) all the transformation steps initially do nothing and return immediately. The actual computation only happens when
count
,show
etc. are called.Show one use case of the Spark UI
Ask the participants to navigate through the Spark UI when some transformation job is running (e.g. Task 8), count the number of stages, what each one does, and how long.
Let me know your thoughts on the above. I can also try to add them to the notebook if we agree. In general, I think the amount of exercises will most likely exceed the workshop duration, but they're well organized with an increase of difficulty, so I think we can set the expectation that one should try to solve them in the defined order, and as many as one can and the time permits.