city_state_dict = {}
for index, user in users[(users['location_city'] != 'n/a') & (users['location_state'] != 'n/a')].iterrows():
if user['location_city'] not in city_state_dict:
city_state_dict[user['location_city']] = user['location_state']
print(city_state_dict)
users['location_state'] = users['location_city'].map(city_state_dict)
location_country
결측치 없음
n/a 비율: 0.01%(11건)
city_country_dict = {}
for index, user in users[(users['location_city'] != 'n/a') & (users['location_country'] != 'n/a')].iterrows():
if user['location_city'] not in city_country_dict:
city_country_dict[user['location_city']] = user['location_country']
print(city_country_dict)
users['location_country'] = users['location_city'].map(city_country_dict)
age
count 40259.000000
mean 36.069873
std 13.842571
min 5.000000
25% 25.000000
50% 34.000000
75% 45.000000
max 99.000000
Name: age, dtype: float64
연속형 변수
결측치: 40%(27,833건)
location_city + age
범주형 데이터 + 연속형 데이터 의 상관관계 분석을 위해, 범주별로 연속형 데이터의 분포를 확인할 수 있음
Background
Notes
user_id
location
location_city
location_state
location_country
age
location_city + age
location_state + age
location_country + age