Watts-College / paf-515-template

https://watts-college.github.io/paf-515-template/
1 stars 1 forks source link

Lab 04 - NaNs produced during Cluster 2 Divisional County plot #13

Open swest235 opened 5 days ago

swest235 commented 5 days ago

In lab 04, when in the wrangling section, my code chunk for the cluster 2 scatterplot is a perfect correlation and it shows NaNs were produced. I've followed the steps but I'm unsure where I went wrong.

image

image

image

image

swest235 commented 4 days ago

Separately, is there a discrepancy between what the video and example report show in terms of where we need to describe findings vs what the final lab submissions check marks say?

The instructions themselves say to describe findings for the NMTC overall correlation and then for the clustered correlations, and the same for LIHTC - so only 4 descriptions. I didn't count the video mentions but the example report has 8 description sections in the NMTC section alone.

castower commented 4 days ago

Hi @swest235,

Good questions!

1) The warning is caused by the geom_smooth(method="lm") step of the scatterplot graphing. It attempts to create a visual range for values using a regression line and standard error, you can see this displayed as the gray shaded area around your other scatterplots and in the examples from the package: (https://ggplot2.tidyverse.org/reference/geom_smooth.html#ref-examples). In cases where you only have 2 values and therefore a (false) perfect relationship, the code will give a warning since it cannot calculate standard error and creates null values. You can use geom_smooth(method="lm", se=FALSE) in your code instead if you'd like to hide the warnings, however these messages will already be hidden from your knit per the code chunk settings. You can review the NMTC national cluster 1 text for an example of describing these findings: https://r-class.github.io/paf-515-course-materials/labs/wk04/lab-04-middle-atlantic-division.html#k-means-clustering-1

2) The expected text sections for analysis are the same as shown in the lab tutorial's code and video. The checkmarks for analysis are the following sections:

  1. Calculate overall correlation between SVI Flag Count in 2010 and NMTC Dollars received from 2011-2020 by county for your Census Division, describe findings

  2. Conduct k-means clustering and calculate the correlation between SVI Flag Count in 2010 and NMTC Dollars received from 2011-2020 by clustered county group for your Census Division, describe findings

  3. Create bivariate map of SVI Flag Count in 2010 and NMTC Dollars received for your Census Division

  4. Calculate overall correlation between SVI Flag Count in 2010 and LIHTC Dollars received from 2011-2020 by county for your Census Division, describe findings

  5. Conduct k-means clustering and calculate the correlation between SVI Flag Count in 2010 and LIHTC Dollars received from 2011-2020 by clustered county group for your Census Division, describe findings

  6. Create bivariate map of SVI Flag Count in 2010 and LIHTC Dollars received for your Census Division

To summarize from a textual standpoint, you can follow these general guidelines:

For sections 1 & 4, you will want to describe your summary statistics, correlation findings, and your outliers for your overall correlation by county (approx. 3 text sections)

For sections 2 & 5, you will want to describe your cluster elbow plots and include a description of the correlation trends for the clusters (at least 2 text sections or you can break this down to have a description of the elbow plot and then separately describe the trends for however many clusters you have after each cluster)

For sections 3 & 6, you will want to create your bivariate map. Similar to lab 3, you will want to include text/data sets to help illustrate the maps as needed (approx 2-3 text sections).

This comes out to approx 7-8 brief text sections for each tax credit program, but feel free to describe your findings as you feel best presents your data. There is not a "required" number as different analyses will have different clusters and different spatial trends. Your goal is to present the lab so someone reading over your work can understand the trends without needing to understand your code. In essence, we're repeating what we did for the SVI flag visuals; just now determining if our areas we identified as vulnerable with the SVI flags are the same areas receiving tax dollars.