PythonDataScience24 / AirBnB-DataScienceProject

GNU General Public License v3.0
2 stars 0 forks source link

Clean data set thoroughly #10

Closed bdravec closed 4 months ago

bdravec commented 4 months ago

As a data scientist I want to use one clean data set across all future features to be implemented so that it is easier to work with the data.

Acceptance criteria

bdravec commented 4 months ago

#create count where data ist missing in the data set missing_values_count = df_data.isnull().sum() print("Number of missing values in respective columns: ", missing_values_count)

Output of missing values in the respective columns: id: 0 NAME: 250 host id: 0 host_identity_verified: 289 host name: 406 neighbourhood group: 29 neighbourhood: 16 lat: 8 long: 8 country: 532 country code: 131 instant_bookable: 105 cancellation_policy: 76 room type: 0 Construction year: 214 price: 247 service fee: 273 minimum nights: 409 number of reviews: 183 last review: 15893 reviews per month: 15879 review rate number: 326 calculated host listings count: 319 availability 365: 448 house_rules: 52131 license: 102597 dtype: int64

bdravec commented 4 months ago
# Get the data types of all columns in the DataFrame
data_types_summary = df_data.dtypes

print("Summary of data types for each column:")
print(data_types_summary)

Summary of data types for each column: id int64 NAME object --> should be str , also should it be renamed to 'listing_description'? host id int64 host_identity_verified object --> should be str host name object --> should be str neighbourhood group object --> str neighbourhood object --> str lat float64 long float64 country object --> remove country code object --> remove instant_bookable object --> should be str cancellation_policy object --> str room type object --> str Construction year float64 price object --> import as str then change to float64 service fee object --> float minimum nights float64 number of reviews float64 last review object --> to_datetime ? reviews per month float64 review rate number float64 calculated host listings count float64 availability 365 float64 house_rules object --> str license object dtype: object

bdravec commented 4 months ago

@bdravec mach mal

bdravec commented 4 months ago

@TheRBen12 thanks much for debugging and helping to get it to run, much appreciated!