Open adelnehme opened 4 years ago
Thanks @adelnehme !! I was able to get through most of the feedback this morning so we can discuss in more detail at 10am. I uploaded a new Solution Notebook to Github - https://github.com/datacamp/string-manipulation-in-sql-live-training/blob/master/notebooks/20200527_String_maniuplation_SQL_live_solution.ipynb
Talk to you in a bit.
Hi @adelnehme I've implemented all of your feedback into the notebook referenced in the previous comment. Let me know if you have any additional comments or suggestions.
Thanks Brian
Hi @adelnehme please use this link instead:
Thanks!
Hi @brianpiccolo :wave:
Hope you're well! In this issue, I will go over some feedback to get your live training notebook to the next level 🚀 I've divided this issue into 2 sections: Notebook (where I give feedback on the content itself, both general and section-specific) and Intangibles (things to look out for while giving the live session itself).
Notebook
General Feedback
[x] A. Great work on this draft - I think the outline of the session really mimics a real-life data scientists' work, where learners will have to explore a dataset, identify what it is needed to be cleaned, and clean the data before performing analysis. To "add more meat" to this session, I recommend adding a couple more data cleaning steps here (so far we have 4) - these can be easy text manipulation problems (like removing trailing or leading white spaces, aligning categorical variables who have different capitalization, separating lat long fields, etc..).
[x] B. Given A, check out this session on data cleaning in Python that has a similar outline, where the first section was a diagnosis of data cleaning problems and a second part was a range of data cleaning tasks (in your case there would be a third section on analysis with less verbose diagnosis + data cleaning 😄).
Setting up PostgreSQL
Exploring the Dataset
[x] 2. 🔍 What is really intended by questioning The dataset is much smaller than you would have expected with only 281 rows. Let's think about this a bit. Do we believe it is correct? - especially since we're not adding more rows in that sense?
[x] 3. Check out in this session how we used the Problem n: and To do list format with markdown when diagnosing and identifying data cleaning problems.
What have we learned about the short term rentals data?
Using built-in functions to manipulate string and character data*
Converting data from one type to another using
CAST()
CAST
- since you can also explain the rationale with your voice during the session. Check out this session in the data cleaning section when describing the tasks.Extracting string data using SUBSTRING() and POSTIION() (<- typo here)
location
column and then break it down verbally as well as in text. For example:Extracting string data using
split_part()
assets
folder understring_part.png
:Using
ARRAYS
to manipulate strings stored as comma-separated-valuesCreating a temporary table with our rental services data
Using temporary tables to simplify complex queries
User-defined functions to create reusable code
Putting it all together
Intangibles
rentals
, or maybe adding thestreet_address
,city_state_zip
andlat_long
columns as well? This does not really go against the creation of temp tables but it strikes me that not updating means we're insinuating that tables need to stay dirty and temp tables can be clean.