DS4PS / course_website

https://ds4ps.github.io/course_website/
0 stars 0 forks source link

Lab 10 - Part One, Question Two #17

Open tnlutes opened 5 years ago

tnlutes commented 5 years ago

What am I missing in order to count the number of nonprofits and then group it by FIPS and Year? Anyone else get confused on this portion of the lab?

Count the nonprofits in each county for each year. Use “FIPS” for the county code and “YEAR” for the tax year.

url <- "https://github.com/DS4PS/Data-Science-Class/blob/master/DATA/nonprofits_2000.csv?raw=true"
np.dat.2000 <- read.csv( url, stringsAsFactors=F )
np.dat.2010 <- read.csv( url, stringsAsFactors=F)
np.dat <- rbind( np.dat.2000, np.dat.2010 )
head( np.dat.2000 )
g <- group_by( dat$fips, dat$year )
lecy commented 5 years ago

Check back over the “data recipe” examples for some syntax:

dat %>% group_by( var1, var2 ) %>% count()
tnlutes commented 5 years ago

I still can't get it to work. I'm not sure what I'm doing wrong. Do I need to change the "dat" to np.data like I have in my load sequence?

url <- "https://github.com/DS4PS/Data-Science-Class/blob/master/DATA/nonprofits_2000.csv?raw=true"
np.dat.2000 <- read.csv( url, stringsAsFactors=F )
np.dat.2010 <- read.csv( url, stringsAsFactors=F)
np.dat <- rbind( np.dat.2000, np.dat.2010 )
head( np.dat.2000 )
group_by( dat$fips, dat$year )
lecy commented 5 years ago

Yes, the example above was a general example. You would need to use your specific dataset name and variable names.

Make sure you are using the correct variable names from the np.dat file. They are slightly different between that dataset and the population dataset.

Also note that the dplyr functions all follow the convention where you first name the dataset, then you reference variables directly by their name instead of using the dat$var convention.

function( dat, var1, var2, ... )
# or with pipes
dat %>% function( var1, var2, ... )
> names( np.dat )
 [1] "MSA_NECH"  "EIN"       "NAME"      "ADDRESS"  
 [5] "CITY"      "STATE"     "ZIP5"      "FIPS"     
 [9] "NTMAJ12"   "NTEE1"     "LEVEL1"    "LEVEL2"   
[13] "LEVEL3"    "LEVEL4"    "MAJGRPB"   "REVENUE"  
[17] "EXPENSES"  "ASSETS"    "RULEDATE"  "MSA"      
[21] "URBAN"     "YEAR"      "LONGITUDE" "LATITUDE"

np.dat %>% group_by( FIPS, YEAR ) %>% count()