Closed bdacunha closed 8 years ago
group_by
breaks the data frame (tbl_df) into groups of data frames. So, in this case, you are passing entire data frames (with a single row in this case, I guess) do the get_urban_population
function.
So I think might merely need to replace the last line with do(get_urban_population(.$countrycode))
. Then you'll need to join that back into the gap2002 data frame. So you might need some tweaks to smooth that transition.
It takes forever to run on the whole gapminder data... and then It will show an error... I tried creating a small set of countries and running the function but still won't work...
my_country_list <- c("Kuwait", "Libya", "Gabon", "Saudi Arabia", "Ireland")
gap2002 <- gapminder %>% subset(country %in% my_country_list) %>%
filter(year == "2002")
gap2002
urban_pop <- gap2002 %>%
select(country, continent, gdpPercap, countrycode) %>%
group_by(countrycode) %>%
do(get_urban_population(.$countrycode))
I get this Error: Results are not data frames at positions: 1, 2, 3, 4, 5
I tried to change the return of the function to a data frame: return(data.frame(urban_pop)) and it works for one country, but when I try it for every country or for my set of countries it doesn't show any error but doesn't show any output either...it just stays there in blank
OK, a few problems. I didn't notice in your code, but the output of do
needs to be assigned a (column) name, just like summarise
. Also, if you don't want a weird list column with a single value, you need to wrap the whole do
expression inside of data.frame
.
Secondly, it will speed things up to just query the country you want each time, vs. getting everything from geonames and filtering it every time. So, have a look at the (reproducible) modifications below:
library(dplyr)
library(geonames)
library(gapminder)
options(geonamesUsername="insert_user_name")
get_urban_population <- function(country_code){
countrybox <- GNcountryInfo(country = country_code)
country_cities <- GNcities(north = countrybox$north,
south = countrybox$south,
east = countrybox$east,
west = countrybox$west,
maxRows = 500)
country_cities <- country_cities %>%
filter(countrycode == country_code)
urban_pop <- country_cities %>%
select(population) %>%
unlist %>%
as.numeric %>%
sum
return(urban_pop)
}
my_country_list <- c("Kuwait", "Libya", "Gabon", "Saudi Arabia", "Ireland")
gap2002 <- gapminder %>%
subset(country %in% my_country_list) %>%
filter(year == "2002")
gap2002$countrycode <- c("GA", "IE", "KW", "LB", "SA")
urban_pop <- gap2002 %>%
select(country, continent, gdpPercap, countrycode) %>%
group_by(country) %>%
do(data.frame(urban_pop = get_urban_population(.$countrycode)))
urban_pop
Source: local data frame [5 x 2]
Groups: country [5]
country urban_pop
(fctr) (dbl)
1 Gabon 932660
2 Ireland 2177007
3 Kuwait 986889
4 Libya 2859178
5 Saudi Arabia 12357892
Thank you soo much!! It works fine now!!
I'm having an OCD moment but if you're filtering already, you can eliminate the subset statement:
gapminder %>%
filter(country %in% my_country_list, year == "2002")
I'm surprised year is character? Also a join or match would be a safer way to bring those two letter country codes in. Less likely to create a puzzle when/if you scale up.
Right, I just added the country codes manually so the example would be reproducible. I assume Brenda has some other (unstated) method for this.
Hi,
I (with Kieran's help) have this function:
But when I try to apply it to gapminder, it won't work... (my gap2000 has the country_code added as iso2c and it's filtered for year 2002 only)
I get the following error:
Error in if (!repeated && grepl("%[[:xdigit:]]{2}", URL, useBytes = TRUE)) return(URL) : missing value where TRUE/FALSE needed
Can someone help me with this?? thanks!!!!
Brenda