datacamp / string-manipulation-in-sql-live-training

Live Training Session: String Manipulation with SQL
2 stars 0 forks source link

Notebook firsts draft review #1

Open adelnehme opened 4 years ago

adelnehme commented 4 years ago

Hi @brianpiccolo :wave:

Hope you're well! In this issue, I will go over some feedback to get your live training notebook to the next level 🚀 I've divided this issue into 2 sections: Notebook (where I give feedback on the content itself, both general and section-specific) and Intangibles (things to look out for while giving the live session itself).

Notebook

General Feedback

Setting up PostgreSQL

Exploring the Dataset

What have we learned about the short term rentals data?

Using built-in functions to manipulate string and character data*

Converting data from one type to another using CAST()

Task 1: Converting issue_date to a date/time value. To convert columns from one data type to another in PostgreSQL, we can use the CAST() function as below here:

SELECT cast(column_name as new_type) 
FROM table;

Since issue_date does not contain time, we can convert it to DATE instead of TIMESTAMP.

Extracting string data using SUBSTRING() and POSTIION() (<- typo here)

To divide strings (or extract substrings) in PostgreSQL, we can use SUBSTRING(column_name, substring_start_position, substring_end_position) where:

  • column_name is the name of the column being processed
  • substring_start_position is the numbered position of the first string of the substring being extracted.
  • substring_end_position is the numbered position of the substring of the last string of the substring being extracted.

A common value from the location column has the format:

street_address \n city_state_zip \n lat_long

for example: 65 Pearl St\nCambridge, MA 02139\n(42.363228, 42.36322)

To divide location into substrings, will divide it based on the position of the break-line character \n by using the POSITION(character IN column_name) function which finds the numbered position of a character in a string column, as well as the char(10) function which returns \n - you can find other special characters here.

Extracting string data using split_part()

image

Using ARRAYS to manipulate strings stored as comma-separated-values

-- A note on nested select statements
SELECT AVG(column_name) 
FROM
(
    SELECT * FROM table
)
GROUP BY column_name;

Creating a temporary table with our rental services data

Using temporary tables to simplify complex queries

SELECT table_1_alias.* , table_2_alias.* 
FROM table_1 AS table_1_alias INNER JOIN table_2 AS table_2_alias 
ON table_1_alias.common_column = table_2_alias.common_column;

User-defined functions to create reusable code

Putting it all together

Intangibles

brianpiccolo commented 4 years ago

Thanks @adelnehme !! I was able to get through most of the feedback this morning so we can discuss in more detail at 10am. I uploaded a new Solution Notebook to Github - https://github.com/datacamp/string-manipulation-in-sql-live-training/blob/master/notebooks/20200527_String_maniuplation_SQL_live_solution.ipynb

Talk to you in a bit.

brianpiccolo commented 4 years ago

Hi @adelnehme I've implemented all of your feedback into the notebook referenced in the previous comment. Let me know if you have any additional comments or suggestions.

Thanks Brian

brianpiccolo commented 4 years ago

Hi @adelnehme please use this link instead:

https://github.com/datacamp/string-manipulation-in-sql-live-training/blob/master/notebooks/20200528_String_maniuplation_SQL_live_solution.ipynb

Thanks!