course-dprep / ADE-event-on-AirBnB-prices

team-assignment-team3 created by GitHub Classroom
0 stars 2 forks source link

data cleaning(ITO) #22

Open Emi676 opened 1 year ago

Emi676 commented 1 year ago

setup/Initialization

  1. load libraries

input

  1. input files

data cleaning

  1. data selection (With Select()
  2. data de-duplication(remove full and partial duplicates/uniqueness constraints)
  3. dirty data problems ( data/ range constraints)
  4. dirty data problems II ( semi&anti join, invalid numbers, str_lower/upper/trim)

output

9.write_csv()

data transformation

-data aggregation(with summarize(), filter(), group_by(), count(), mean/sum/median)