gge-ucd / Discussion

Class discussion for R-DAVIS course
0 stars 4 forks source link

personal code issue using dplyr #20

Closed riodesangre closed 7 years ago

riodesangre commented 7 years ago

Hi @ryanpeek, I've been using the code you showed us in class to create and refine my own database for groundwater management for my work. I tried some code and asked some friends, and we don't understand why is not working. I'm taking it to your level to see if you can help me. I'm stuck and cannot move forward until I resolve this issue.

If you go to myrepo and scripts here https://github.com/lindaesteli/myrepo/blob/master/scripts/gw_database.R

You'll see I've tried multiple ways to 'filter':

I did it the traditional way I learned how to code and that didn't work: SDWIS <- SDWIS[SDWIS$AREA %in% c("Institution","Wholesaler (Sells Water)","Interstate Carrier", "Service Station", "Industrial/Agricultural","Municipality")]

Then, I did it the way you have taught us: SDWIS <- SDWIS %>% filter(SOURCE !="Surface Water", SOURCE !="Surface Water Purchased",
SOURCE !="Unknown"), AREA != "Mobile Home Park", AREA != "Other Area", AREA != "Other Transient Area", AREA != "Restaurant", AREA != "Secondary Residences", AREA != "Day Care Center", AREA != "Hotel/Motel", AREA != "Mobile Home Park,Princ. Res.", AREA != "Other Non-Transient Area", AREA != "Retail Employees", AREA != "Highway Rest Area", AREA != "Medical Facility", AREA != "Other Residential Area", AREA != "Residential Area", AREA != "School", AREA != "Summer Camp") %>% select(PWSID, PWS_NAME, SOURCE, AREA)

and STILL is not working. Funny thing is that same code works for other databases in the code. If you see below I did the same for my 'watbound' database and it worked. So I was thinking maybe its a matter with the SDWIS database so I went ahead and made sure it was 'as.data.frame' yet still nothing. What am I missing?

Thanks so much for your help!

Linda.

ryanpeek commented 7 years ago

Fixed! Check your personal repo under issues for the answer, and for the script I pushed to your repository there.

Issue was trailing whitespaces in the AREA field. Easy to fix with built in functions, but annoying to diagnose. It's why dplyr wasn't working. Also, look at using joins in the dplyr package as compared to merge(). Both do same thing, but with joins you have far more flexibility and specificity about how you want things to work, and they are faster when joining with spatial data.