Created a new column called year_date since we often compute year(date) in queries, and I think this would reduce query execution time by a bit
Partitioned the data by state and year to improve query execution time
Added notebooks in .htmlformat as well, so people who don’t have access to Databricks workspace can download the .html files and still view the notebooks in their browsers. Also updated README.md to add this information
year(date)
in queries, and I think this would reduce query execution time by a bit.html
format as well, so people who don’t have access to Databricks workspace can download the .html files and still view the notebooks in their browsers. Also updated README.md to add this information