Open kelseymok opened 1 year ago
Decide if we want to create this in a new repo. If we change this in the small-exercises repo, let's make sure to not interrupt the current running Tour
From Syed:
@syed-tw do you have the Databricks notebook that was supposed to be at https://www.databricks.com/notebooks/Demo_Hub-Delta_Lake_Notebook.html?utm_source=youtube&utm_medium=web&utm_campaign=7013f000000cVKYAA2? It redirects to the DBX homepage now.
I have the follow 4 files as in my workspace (compressed in .zip achieve)
Does this work? @kelseymok
@syed-tw no these are our Delta Lake exercises - I'm referring to the delta lake demo in the link above. It seems like you did some research already and since that file no longer exists, I was hoping that you had a version that you had already downloaded because if we don't, we can't fix this.
This has now been updated with new content and new notebooks.
Comments / obervations while comparing delta-lake-walkthrough
(in small-exercises repo) with 00-Delta-Lake-Introduction
00-Delta-Lake-Introduction
is much more detailed00-Delta-Lake-Introduction
needs ML Runtime (13.2 ML, Scala 2.12, Spark 3.4.0)00-Delta-Lake-Introduction
uses Contraint
for Quality Check, delta-lake-walkthrough
adds a new column to check itClone Delta Tables
in 00-Delta-Lake-Introduction
is newRest all looks pretty much the same in both the notebooks
@syed-tw great to see that there's more content. Let's fold in those points (i-iv) from 00-Delta-Lake-Introduction (the new notebook takes precedence) to our notebook. We'll also need to update the cluster-creation process to use the right runtime (will create a task for that).
https://github.com/data-derp/documentation/issues/3 -> task for updating runtime.
Updating the list of changes as the Spark ML Runtime is not needed anymore.
00-Delta-Lake-Introduction
is much more detailed00-Delta-Lake-Introduction
uses Contraint
for Quality Check, delta-lake-walkthrough
adds a new column to check itClone Delta Tables
in 00-Delta-Lake-Introduction
is newIntegrated the new notebooks (delta lake introduction and performance) into data-derp (small-exercises repo)
There are two new amazing notebooks from Databricks which will fit in very well here. The first one is similar to our demo that already exists, and the second is a new notebook which can be used as bonus material.
NOTE: In this issue, we'll update the OLD delta-lake-walkthrough
[ ] Download the two notebooks here: Archive.zip
[ ] Import the 00-Deltalake Notebook
[ ] Compare our existing Delta Lake exercise (delta-lake-walkthrough) with this one and decide on a merging strategy (e.g. what content to keep and what to edit)
[ ] Remove all Databricks-demo specific text that doesn't pertain to our content (e.g. "a cluster has been created for you...")
[ ] Add our per-user workspace selector and stream helpers: https://github.com/data-derp/small-exercises/blob/master/delta-lake-walkthrough/delta-lake-walkthrough.py#L31-L150 (if it doesn't already exist)
[ ] Add at the top of the notebook "This notebook is adapted from the Delta Lake Demo provided by Databricks".
[ ] Write to python source (it might be sql source, but let's see)
[ ] Upload to the same dir as the in the small-exercises repo
[ ] Add extra explanations for questions that might come up at the bottom of this page (e.g. optimize): https://data-derp.github.io/docs/2.0/making-big-data-work/exercise-delta-lake
OLD CONTENT
Let's make sure that our Delta Lake exercise is working and up to date.
Context: this ticket is a revamp of the old one because we don't know what state the "updated" Delta Lake exercise is in so we'll first check it, then update it, and then we'll add a new Delta Lake exercise.
This is no longer relevant because the notebook is no longer at this url