Open MeghanPaquette opened 3 years ago
When you are building the widget you need to match the label users will see on the GUI with the actual variable name.
cbind( dd.name, value ) %>% knitr::kable()
dd.name | value |
---|---|
Percent white, non-Hispanic | pnhwht12 |
Percent black, non-Hispanic | pnhblk12 |
Percent Hispanic | phisp12 |
Percent Native American race | pntv12 |
Percent foreign born | pfb12 |
Percent speaking other language at home, age 5 plus | polang12 |
Percent with high school degree or less | phs12 |
Percent with 4-year college degree or more | pcol12 |
Percent unemployed | punemp12 |
Percent female labor force participation | pflabf12 |
Percent professional employees | pprof12 |
Percent manufacturing employees | pmanuf12 |
Percent veteran | pvet12 |
Percent self-employed | psemp12 |
Median HH income, total | hinc12 |
Per capita income | incpc12 |
Percent in poverty, total | ppov12 |
Percent owner-occupied units | pown12 |
Percent vacant units | pvac12 |
Percent multi-family units | pmulti12 |
Median rent | mrent12 |
Median home value | mhmval12 |
Percent structures more than 30 years old | p30old12 |
Percent HH in neighborhood 10 years or less | p10yrs12 |
Percent 17 and under, total | p18und12 |
Percent 60 and older, total | p60up12 |
Percent 75 and older, total | p75up12 |
Percent currently married, not separated | pmar12 |
Percent widowed, divorced and separated | pwds12 |
Percent female-headed families with children | pfhh12 |
You need to align labels and values with the proper arguments:
choiceNames=temp.names,
choiceValues=these.variables,
radioButtons( inputId="demographics",
label = h3("Census Variables"),
choiceNames=temp.names,
choiceValues=these.variables,
selected="Percent white, non-Hispanic")
choiceNames, choiceValues |
List of names and values, respectively, that are displayed to the user in the app and correspond to the each choice (for this reason, choiceNames and choiceValues must have the same length).
Note that neighborhood change variables would need to be created as:
x.change <- x.2010 - x.2000
# all together
d <-
d %>%
mutate( x1.change = x1.2010 - x1.2000,
x2.change = x2.2010 - x2.2000 )
Recall that the 2012 variables are really 2010 variables (2012 ACS five-year samples, which center at 2010).
@lecy
I seem to still be getting the same error message on the dashboard for both. I am focusing on the first part. It is saying something about the mutate, so I can't tell if it is considering the input$demographics part, the vegas.sf part, or the whole thing is just wrong.
ERROR in First Tab:
Problem with mutate()
input q
.
[31mx[39m object 'pnhwht12' not found
[34mℹ[39m Input q
is ntile(get(input$demographics), 10)
.
ERROR in Second Tab: undefined columns selected
`
# DATA STEPS
# from local file path
vegas <- geojson_read( "vegas_dorling.geojson", what="sp" )
plot( vegas )
# reproject the map
vegas2 <- spTransform( vegas, CRS("+init=epsg:3395") )
# convert the sp map format to
# an sf (simple features) format:
# ggmap requires the sf format
vegas.sf <- st_as_sf( vegas2 )
# separate out the data frame from the map
d <- as.data.frame( vegas.sf )
Community Demographics
=====================================
these.variables <- c("pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12", "polang12",
"phs12", "pcol12", "punemp12", "pflabf12", "pprof12", "pmanuf12",
"pvet12", "psemp12", "hinc12", "incpc12", "ppov12", "pown12",
"pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12", "p10yrs12",
"p18und12", "p60up12", "p75up12", "pmar12", "pwds12", "pfhh12")
value <- c("pnhwht12", "pnhblk12", "phisp12",
"pntv12", "pfb12", "polang12", "phs12", "pcol12", "punemp12",
"pflabf12", "pprof12", "pmanuf12", "pvet12", "psemp12", "hinc12",
"incpc12", "ppov12", "pown12", "pvac12", "pmulti12", "mrent12",
"mhmval12", "p30old12", "p10yrs12", "p18und12", "p60up12", "p75up12",
"pmar12", "pwds12", "pfhh12")
dd.name <- c("Percent white, non-Hispanic",
"Percent black, non-Hispanic", "Percent Hispanic", "Percent Native American race",
"Percent foreign born", "Percent speaking other language at home, age 5 plus",
"Percent with high school degree or less", "Percent with 4-year college degree or more",
"Percent unemployed", "Percent female labor force participation",
"Percent professional employees", "Percent manufacturing employees",
"Percent veteran", "Percent self-employed", "Median HH income, total",
"Per capita income", "Percent in poverty, total", "Percent owner-occupied units",
"Percent vacant units", "Percent multi-family units", "Median rent",
"Median home value", "Percent structures more than 30 years old",
"Percent HH in neighborhood 10 years or less", "Percent 17 and under, total",
"Percent 60 and older, total", "Percent 75 and older, total",
"Percent currently married, not separated", "Percent widowed, divorced and separated",
"Percent female-headed families with children")
x <- dd.name
names(x) <- value
cbind( dd.name, value ) %>% knitr::kable()
temp.names <- paste0( dd.name )
radioButtons( inputId="demographics",
label = h3("Census Variables"),
choiceNames=temp.names,
choiceValues=these.variables,
selected="pnhwht12")
renderPlot({
# split the selected variable into deciles
get_data <-
reactive({
vegas.sf <-
vegas.sf %>%
mutate( q = ntile( get(input$demographics), 10 ) )
})
ggplot( get_data() ) +
geom_sf( aes( fill = q ), color=NA ) +
coord_sf( datum=NA ) +
labs( title = paste0( "Choropleth of Select Demographics: ", toupper(input$demographics) ),
caption = "Source: Harmonized Census Files",
fill = "Population Deciles" ) +
scale_fill_gradientn( colours=rev(ocean.balance(10)), guide = "colourbar" ) +
xlim( xmin = -12965489, xmax = -12666171 ) +
ylim( ymin = 4227911, ymax = 4352610 )
})
renderPlot({
# extract vector x from the data frame
x <- d[ "pnhwht12" ] %>% unlist()
get_variable_x <- reactive({ d[ input$demographics ] })
x <- get_variable_x() %>% unlist()
cut.points <- quantile( x, seq( 0, 1, 0.1 ) )
hist( x, breaks=50,
col="gray", border="white", yaxt="n",
main=paste0( "Histogram of variable ", toupper( input$demographics ) ),
xlab="red lines represent decile cut points" )
abline( v=cut.points, col="darkred", lty=3, lwd=2 )
})
`
Which variables are in vegas.sf?
names( vegas.sf ) %>% sort()
Hi, I was having a similar issue and ran
names( sea.sf ) %>% sort()
Below is are the variables in my sea.sf dataset. It looks like I'm missing the ones in these.variables <c (...) in the template. Do I need to recreate all of these? Or did something happen in the merge for the datafile?
Yes, you should add all the variables you need in the data steps.
For the demographic / choropleth tabs the actual statistics are better than z-scores.
You don't have to stick to variables used in the demo template if you think others would be better.
Pro tip - if you want to create a variable list using existing variables try:
dput( names( dat ) )
Which will return a character vector instead of just print names out.
e.g.
c("x1","x2","x3")
Hi @lecy,
That worked and my dashboard is close, but I'm receiving an error in variable distribution in community demographics and neighborhood change and not seeing the map in values. I suspect it's from the same error below. I'm assuming this is an issue in my data, not in my code. It looks like there are NaN values somewhere in the data. From a statistical integrity perspective what is the best way to handle these? I don't know if I should replace with 0, etc. From a code perspective what is the fix?
@lecy Thinking about it more, I think I see what is happening. When I remove the outliers in the census dataset and join to the dorling dataset there will be missing values. I tried sea[complete.cases(sea), ], but it returned an error. What code should I use to remove rows with NA values?
UPDATE: I've also tried na.omit(sea), but the rows with NA are still present.
In my merge I tried shifting from a left join (all.x=T) to a right join (all.y=T) and it had no effect. This one is particularly perplexing. You can see very clearly that there is an NA in the tractid.
You can approach it in three ways:
(1) Create a data subset for a specific purpose, like the data for the clustering models where you select specific variables. In this case a subset of data for a specific tab that is limited to the user options specified in the widget.
You can then remove ALL rows that have ANY missing values:
d2 <- na.omit( d1 )
This is pretty heavy-handed, though, so it just depends on how much missing data you have.
If you have one variable with lots of missing values, for example, it might mean that you drop half of your dataset because of that one variable.
(2) Select a variable, then omit missing cases for that variable.
Much more conservative approach that will minimize the amount of data dropped:
v1 <- d$v1
v2 <- na.omit( v1 )
(3) Data imputation
Probably the most complicated approach, and I would recommend using this only if you were doing modeling where the sample size was important.
It is not a good idea to impute missing values using zero, though. Much better to use the mean. Something like:
v2 <- v1
v2[ is.na( v2 ) ] <- mean( v2, na.rm=T )
@JasonSills That is odd. Especially after changing the left join to a right join.
Can you write the data file to a RDS and attach it here?
d <- sea@data
saveRDS( d, "seattle.rds" )
It might be the case that they are stored as "NA" strings in a character vector, in which case not sure if the na.omit() or complete.cases() operators would work.
You should check to see if the data is empty in the census side before merging.
d[ d$tractid2 == "5302997011" , ]
If the NAs are still present after a left join that means the tract ID exists but the associated data is all NAs. I seem to recall some rows like that in the LTDB database.
Or to get extent of missing data in the LTDB census data just try:
summary( d )
It should report the number of missing values per variable.
Here is the rds file, I had to zip it to attach it here. seattle.zip
No NAs in the census dataset. They are introduced during the merge step.
sum( is.na( d ) )
[1] 0
x <- unlist( d )
sum( grepl( "^NA$", x ) )
[1] 0
If you remove the all.x=TRUE and all.y=TRUE argument during merge() it will return the union of shared tract IDs. That would drop all of the missing tracts from your data.
Otherwise I'm not sure why the complete cases function would not work, except there are some cases with values of infinity in the census file. Maybe any of the non-finite numbers cause errors as well?
What I was thinking is that it has to do with filtering out urban and the functions for removing outliers mhv.00[ mhv.00 < 1000 ] <- NA
mhv.growth[ mhv.growth > 200 ] <- NA If I'm removing some from the census data set, not from the dorling data set, and join on dorling I'll have the NA values. But with that it would seem it's a problem with every student's dataset, and I'm thinking I'm the only one with this issue.
Hi @lecy
Some minor success. I went back and removed the urban filter and outliers and I now have variable distributions. I also found the bug in the median home values, so the variable distributions are working in both. Seems to support my above hypothesis.
Unfortunately this did not update my choropleth maps. It's a blank white sheet. My map for my clusters is working, so I don't think it's my xmins or ymins coordinates.
Wondering if it could be my code:
these.variables <- c("pnhwht12", "pnhblk12", "phisp12", "pntv12", "pfb12", "polang12",
"phs12", "pcol12", "punemp12", "pflabf12", "pprof12", "pmanuf12",
"pvet12", "psemp12", "hinc12", "incpc12", "ppov12", "pown12",
"pvac12", "pmulti12", "mrent12", "mhmval12", "p30old12", "p10yrs12",
"p18und12", "p60up12", "p75up12", "pmar12", "pwds12", "pfhh12")
value <- c("pnhwht12", "pnhblk12", "phisp12",
"pntv12", "pfb12", "polang12", "phs12", "pcol12", "punemp12",
"pflabf12", "pprof12", "pmanuf12", "pvet12", "psemp12", "hinc12",
"incpc12", "ppov12", "pown12", "pvac12", "pmulti12", "mrent12",
"mhmval12", "p30old12", "p10yrs12", "p18und12", "p60up12", "p75up12",
"pmar12", "pwds12", "pfhh12")
dd.name <- c("White, non-Hispanic",
"Black, non-Hispanic", "Hispanic", "Native American",
"Foreign born", "Speak other language at home, age 5 plus",
"High school degree or less", "4-year college degree or more",
"Unemployed", " Female labor force participation",
"Professional employees", "Manufacturing employees",
"Veteran", "Self-employed", "Median HH income, total",
"Per capita income", "Poverty", "Owner-occupied units",
"Vacant units", "Multi-family units", "Median rent",
"Median home value", "Structures more than 30 years old",
"HH in neighborhood 10 years or less", "17 and under",
"60 and older", "75 and older, total",
"Currently married, not separated", "Widowed, divorced and separated",
"Female-headed families with children")
x <- dd.name
names(x) <- value
temp.names <- paste0( dd.name )
radioButtons( inputId="demographics",
label = h3("Census Variables"),
choiceNames=temp.names,
choiceValues=these.variables,
selected="pnhwht12")
renderPlot({
# split the selected variable into deciles
get_data <-
reactive({
sea.sf <-
sea.sf %>%
mutate( q = ntile( get(input$demographics), 10 ) )
})
ggplot( get_data() ) +
geom_sf( aes( fill = q ), color=NA ) +
coord_sf( datum=NA ) +
labs( title = paste0( "Choropleth of Select Demographics: ", toupper(input$demographics) ),
caption = "Source: Harmonized Census Files",
fill = "Population Deciles" ) +
scale_fill_gradientn( colours=rev(ocean.balance(10)), guide = "colourbar" ) +
xlim( xmin = -13647722, xmax = -13567392 ) +
ylim( ymin = 6084955, ymax = 5941032 )
})
Are the xlim and ylim values adjusted for Seattle?
xlim( xmin = -13647722, xmax = -13567392 ) +
ylim( ymin = 6084955, ymax = 5941032 )
These should be the same as your bounding boxes in tmap.
Yes, that is what has me confused. The box looks like what I would expect from the values being off, but they are what I've used in other labs and the tmap.
Tmap: bb <- st_bbox( c( xmin = -13647722, xmax = -13567392, ymax = 6084955, ymin = 5941032 ),
Here is the link to my dashboard on shiny: https://sills-asu.shinyapps.io/CPP-529-Seattle-Final-Project-Sills/
If I comment out the x and y lims then your data appears on the map, so that's definitely the problem.
Take a look again at your ymin and ymax. See the issue?
Yes!! It worked now!
The last two dashboard tabs are still having some issues, I think in regards to the variable names. I am looking at that as well, but if you see something specific feedback or anything would be appreciated.
Meghan
On Mon, Dec 7, 2020 at 1:59 AM Jesse Lecy notifications@github.com wrote:
If I comment out the x and y lims then your data appears on the map, so that's definitely the problem.
Take a look again at your ymin and ymax. See the issue?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/DS4PS/cpp-529-fall-2020/issues/31#issuecomment-739776861, or unsubscribe https://github.com/notifications/unsubscribe-auth/APR37BMKKEBU5AOY62ETPLDSTSKONANCNFSM4UOCWWQQ .
@lecy
Well, I figured it out. This was one of those things that I knew there was something in the xlim and ylim code, but just couldn't see. Hours and hours of trying all kinds of things. I even downloaded another student's source code and worked through that, to no avail. The issue? the ymax and ymin in my bbox above were switched, so I was inversing them.
@JasonSills I had to stare at it for 10 minutes before seeing it. It reminds me how you can't edit your writing in real-time for grammar because your brain won't let you see it until you walk away and come back to it.
Hi @lecy Dr. Lecy,
I think I am almost there! I am just a little stuck on the input variables not running in other chunks. There may be a larger error, but I have tried a couple of things and they didn't work. I attached two photos from the output of the dashboard, and then the code chunks related to the error. Meghan
`
`