DS4PS / cpp-528-spr-2021

https://ds4ps.org/cpp-528-spr-2021/
0 stars 0 forks source link

LAB 4 Source File questions #28

Open Niagara1000 opened 3 years ago

Niagara1000 commented 3 years ago

Hi @cenuno ,

In the lab 04 source file that you provided at https://github.com/DS4PS/cpp-528-spr-2021/blob/main/labs/lab_04_source.R , it says

# filter **rural** districts
d <- dplyr::filter( d, urban == "**urban**" )

do we need to filter rural or urban districts?

cenuno commented 3 years ago

This is a great catch. I should have used clear language to denote the inclusion or exclusion.

My code explicitly says to keep the urban tracts. My comment implicitly says to exclude the rural tracts.

You need the urban tracts so drop any non urban tracts 👍🏽

— Cristian E. Nuno


From: Niagara1000 @.> Sent: Thursday, April 1, 2021 10:32:18 AM To: DS4PS/cpp-528-spr-2021 @.> Cc: Cristian Ernesto Nuno @.>; Mention @.> Subject: [DS4PS/cpp-528-spr-2021] LAB 4 Source File questions (#28)

Hi @cenunohttps://github.com/cenuno ,

In the lab 04 source file that you provided at https://github.com/DS4PS/cpp-528-spr-2021/blob/main/labs/lab_04_source.R , it says

filter rural districts

d <- dplyr::filter( d, urban == "urban" )

do we need to filter rural or urban districts?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DS4PS/cpp-528-spr-2021/issues/28, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZB2S73UTMHI5NL424UCETTGSU2FANCNFSM42HP7G2Q.

Niagara1000 commented 3 years ago

Hi @cenuno ,

In the same source file, I see

"drop low 2000 median home values to avoid unrealistic growth rates."

https://github.com/DS4PS/cpp-528-spr-2021/blob/e873daa7fd7b2708068b1ec1c22e3e9991e061d8/labs/lab_04_source.R#L119

Can you please elaborate on what that means?

thank you!

cenuno commented 3 years ago
# drop low 2000 median home values
# to avoid unrealistic growth rates.
#
# tracts with homes that cost less than
# $10,000 are outliers
mhv.00[ mhv.00 < 10000 ] <- NA

This was meant to drop tracts whose median home values were very low, too low for us to consider for our purposes. They represent edge cases that our linear regression should not include.

Niagara1000 commented 3 years ago

@cenuno , i was actually confused about the 2000 part of the sentence, but I understand that we basically are removing outliers

cenuno commented 3 years ago

Ah, the 2000 is the year reference (not a median home value) which is implicitly defined within the vector name mhv.00 (where the last two digits reference 2000). This was a pattern used within last week's lab but I will mark it as a TODO to have a clearer description of 2000 referencing the year.

Niagara1000 commented 3 years ago

Hi @cenuno ,

Ok, thank you! Also, I have more questions.

1.

In the source file, I see

# ------ average growth in median home value for the city ------
cbsa_stats_df <- 
  d %>%
  dplyr::group_by( cbsaname ) %>% # for each CBSA name,
  dplyr::summarize( metro.mhv.change = median( mhv.change, na.rm=T ), # find the median `mhv.change` value
                    metro.mhv.growth = 100 * median( mhv.growth, na.rm=T ) ) %>% # and find the median `mhv.growth` value
  dplyr::ungroup() # ?

I ran the code with and without the ungroup() function, but don't see any difference in the output, so I'm not sure of what ungroup() is doing 🤔



2.

do we need to store the lab_04_source.R file under labs or can we have it inside our individual folder within labs/wk04/ ?

# load necessary functions and objects ----
# note: all of these are R objects that will be used throughout this .rmd file
import::here("S_TYPE",
             "panel.cor",
             "panel.smooth",
             "jplot",
             "d",
             "df",
             "cbsa_stats_df",
             # notice the use of here::here() that points to the .R file
             # where all these R objects are created
             .from = here::here("labs/lab_04_source.R"),
             .character_only = TRUE)



3.

In the lab 4 tutorial, I see:

Median Home Value
hist( df$MedianHomeValue2000, breaks=200, xlim=c(0,500000), 
      col="gray20", border="white",
      axes=F, 
      xlab="MHV (median = $138k)",
      ylab="",
      main="Median Home Value in 2000 (2010 US dollars)" )  <==

What does the main=".. (2010 US dollars)" mean?



Thank you! 😄

cenuno commented 3 years ago

Hi Archana,

  1. Check the documentation for the dplyr::ungroup() function and come back with questions if the documentation is unclear. "Grouped" data frames behave a little differently than regular data frames.
  2. You should store it in the week specific directory as you suggest. The course directory that stores the course material will not mimic how your group directory should look like.
  3. Check out the documentation for the hist() function, specifically for the main parameter. It's input is a string so we know that its contents are only meant to be displayed back to the user.
    • At this point we're comfortable knowing that inflation needs to be accounted for when using "older" dollar amounts. This means values in the year 2000 need to be adjusted to 2010 dollars by "inflating" their value over the course of the decade. The message inside of main allows the reader to know that the author has accounted for inflation (i.e. 2000 is the year but we've inflated their $ values to be 2010 year values so that we can truly 2010 prices to 2000 prices).

Overall, I do think it's a good thing you're going line by line to make sure you are comfortable with these .R files. When faced with specific questions like "how does this work?" or "what happens if I change this?", it is often best to try it first and then bring questions back to your team lead about why something didn't work as you expected.

Niagara1000 commented 3 years ago

Hi @cenuno , ok I will check the documentation for ungroup() and other questions first

For the 3rd question, I should have been more clear, sorry

I meant to ask what does '2010 US dollars' mean in the context of 'Median Home Value in 2000'? But I see that you have edited your comment and answered my question there, so thank you!