This section is a bit confusing - it is the first time we are seeing the "DB" get used. We also go through a bucketing function but we never really have a notebook cell that shows the output under the hood, to solidfy our understanding. Even doing a query on the "bucket by" year, to show the performance impact, would be a good summation of that section. Compare that to the repartition / coalesce section - the notebook instructions highlight the # of files in the file systems relates to our parameters for coalesce or repartition (https://github.com/data-derp/small-exercises/blob/master/databricks-repartition-vs-write-partition-by.dbc)
This section is a bit confusing - it is the first time we are seeing the "DB" get used. We also go through a bucketing function but we never really have a notebook cell that shows the output under the hood, to solidfy our understanding. Even doing a query on the "bucket by" year, to show the performance impact, would be a good summation of that section. Compare that to the repartition / coalesce section - the notebook instructions highlight the # of files in the file systems relates to our parameters for coalesce or repartition (https://github.com/data-derp/small-exercises/blob/master/databricks-repartition-vs-write-partition-by.dbc)