In "Spark Workflow and Partitioning" course the link to "to emphasise the point" is broken or not sure if it meant some joke. In the same section the hyper link to "The coalesce transformation applied to a DataFrame "
I noticed in the Handling Late Data section of House 9, under Tumbling Time Window section that the 3rd paragraph says we should "sum up all the prices for the stock symbol" but below in the example, the aggregation used is actually max.
Stateful streaming: Exercise: Read from Stream: It says "Rows per Section" instead of "Rows per Second"
Local setup
Installing Java M1 chip which made it a bit more complex. I ended up downloading Rosetta 2 to make it work. Maybe its worth adding?
In the readme under "Apache Spark Set up instructions", step 8 says "The same can be done for pyspark". This instruction is too unclear, what does "the same" referr to, is there any way to make it more explicit? I copy and pasted the code and it worked but too little explanation.
Also in the readme, for the Aapche section, on step 3 when you're exporting SPARK_HOME it says you can put it in your preferred location, would it be possible to add a suggested location?
Local setup