elephantscale / training-qa

New trainers complain here
0 stars 1 forks source link

Advanced questions for Hadoop and Spark #3

Open markkerzner opened 5 years ago

markkerzner commented 5 years ago

Add a new slide deck or modify existing ones

How to determine the number of buckets The number of files, what defines it? How the replication works, how failover works? Read through the explain plan for a Hive query How YARN allocates Spark containers How to size your executor memory What to look at after running Spark jobs How to look at YARN logs Driver memory Walk through the documentation fast but then spend more time helping understand how things