boostcampaitech6 / level1-bookratingprediction-recsys-02

level1-bookratingprediction-recsys-02 created by GitHub Classroom
5 stars 2 forks source link

[Wiki] Users EDA #35

Closed GangBean closed 6 months ago

GangBean commented 6 months ago

Background

Notes

 #   Column    Non-Null Count  Dtype  
---  ------    --------------  -----  
 0   user_id   68092 non-null  int64  
 1   location  68092 non-null  object 
 2   age       40259 non-null  float64

user_id

count     68092.000000
mean     139381.329539
std       80523.969862
min           8.000000
25%       69008.750000
50%      138845.500000
75%      209388.250000
max      278854.000000
Name: user_id, dtype: float64

location

0              timmins, ontario, canada
1               ottawa, ontario, canada
2                         n/a, n/a, n/a
3              toronto, ontario, canada
4    victoria, british columbia, canada
5               ottawa, ontario, canada
6                             ottawa, ,
7             kingston, ontario, canada
8               ottawa, ontario, canada
9               comber, ontario, canada
Name: location, dtype: object

location_state

city_state_dict = {}
for index, user in users[(users['location_city'] != 'n/a') & (users['location_state'] != 'n/a')].iterrows():
  if user['location_city'] not in city_state_dict:
    city_state_dict[user['location_city']] = user['location_state']
print(city_state_dict)
users['location_state'] = users['location_city'].map(city_state_dict)

location_country

city_country_dict = {}
for index, user in users[(users['location_city'] != 'n/a') & (users['location_country'] != 'n/a')].iterrows():
  if user['location_city'] not in city_country_dict:
    city_country_dict[user['location_city']] = user['location_country']
print(city_country_dict)
users['location_country'] = users['location_city'].map(city_country_dict)

age

count    40259.000000
mean        36.069873
std         13.842571
min          5.000000
25%         25.000000
50%         34.000000
75%         45.000000
max         99.000000
Name: age, dtype: float64

location_city + age

location_state + age

image

location_country + age

image