alperyilmaz / dav-exercises

Exercise questions submitted by Data Analysis and Visualization with R course students at YTU
GNU General Public License v3.0
1 stars 2 forks source link

Data Manipulation #7

Open bestepamukogullar opened 6 years ago

bestepamukogullar commented 6 years ago

Question

Congratulations! You will have a baby, soon. You've already started to look at the name. But it's hard to find a baby name when you don’t know the sex of the baby. Oouuv,you still want to call to baby with it's name. You want to a name that can be used for both boys and girls, from names given to babies between years of 1900 and 1800. What an old old-fashioned! What is the most used name that fits your wishes?

Please work with the “babynames” package.

Name total_n
……………….. ………………

install.packages("babynames")
library(babynames)
library(dplyr)

girls_name <- babynames %>%
  filter(year<1900 & year>1800 ,sex=="F") %>%
  group_by(name) %>%
  summarise(total_n =sum(n)) %>%
  arrange(desc(total_n))

boys_name <-babynames %>%
  filter(year<1900 & year>1800 ,sex=="M") %>%
  group_by(name) %>%
  summarise(total_n =sum(n)) %>%
  arrange(desc(total_n))

common_name <- boys_name %>%
intersect(girls_name, name) %>%
top_n(1,total_n)
common_name

Additional information

Originality

Please mark relevant information with x, (ex. [x])

Is this question

Difficulty Level

Tags (optional)

filter, summarise, joining data

alperyilmaz commented 6 years ago

again very nice question.. there are numerous analysis of babynames regarding unisex names (an example, is this question original or inspired? Clearly the question is not paraphrased or copied, just wondering if there's any resources inspiring it.

bestepamukogullar commented 6 years ago

Thank you for your good comment. I was just found this babynames package when i am installing packages from R studio. This question is all my idea and it's original :)

alperyilmaz commented 6 years ago

according to submitted solution the result is Courtney which has been used 83 times, distributed nearly equally between males and females. But, when we remove the year limit, the most common unisex name is James which is misleading. Because, there are very few females named James, for instance, in 2015, there are 14705 boys named James and only 38 girls named James. Thus, counting by group_by(name) only might be misleading. So,

would be nice.