databricks / devrel

This repository contains the notebooks and presentations we use for our Databricks Tech Talks
693 stars 426 forks source link

Sql fixes #54

Closed chengyineng38 closed 3 years ago

chengyineng38 commented 3 years ago
  1. Converted data ingestion code from Python to SQL
  2. Created a new column called year_date since we often compute year(date) in queries, and I think this would reduce query execution time by a bit
  3. Partitioned the data by state and year to improve query execution time
  4. Added notebooks in .htmlformat as well, so people who don’t have access to Databricks workspace can download the .html files and still view the notebooks in their browsers. Also updated README.md to add this information