Open Niagara1000 opened 3 years ago
This is a great catch. I should have used clear language to denote the inclusion or exclusion.
My code explicitly says to keep the urban tracts. My comment implicitly says to exclude the rural tracts.
You need the urban tracts so drop any non urban tracts 👍🏽
— Cristian E. Nuno
From: Niagara1000 @.> Sent: Thursday, April 1, 2021 10:32:18 AM To: DS4PS/cpp-528-spr-2021 @.> Cc: Cristian Ernesto Nuno @.>; Mention @.> Subject: [DS4PS/cpp-528-spr-2021] LAB 4 Source File questions (#28)
Hi @cenunohttps://github.com/cenuno ,
In the lab 04 source file that you provided at https://github.com/DS4PS/cpp-528-spr-2021/blob/main/labs/lab_04_source.R , it says
d <- dplyr::filter( d, urban == "urban" )
do we need to filter rural or urban districts?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/DS4PS/cpp-528-spr-2021/issues/28, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AFZB2S73UTMHI5NL424UCETTGSU2FANCNFSM42HP7G2Q.
Hi @cenuno ,
In the same source file, I see
"drop low 2000 median home values to avoid unrealistic growth rates."
Can you please elaborate on what that means?
thank you!
# drop low 2000 median home values
# to avoid unrealistic growth rates.
#
# tracts with homes that cost less than
# $10,000 are outliers
mhv.00[ mhv.00 < 10000 ] <- NA
This was meant to drop tracts whose median home values were very low, too low for us to consider for our purposes. They represent edge cases that our linear regression should not include.
@cenuno , i was actually confused about the 2000 part of the sentence, but I understand that we basically are removing outliers
Ah, the 2000 is the year reference (not a median home value) which is implicitly defined within the vector name mhv.00
(where the last two digits reference 2000). This was a pattern used within last week's lab but I will mark it as a TODO to have a clearer description of 2000 referencing the year.
Hi @cenuno ,
Ok, thank you! Also, I have more questions.
In the source file, I see
# ------ average growth in median home value for the city ------
cbsa_stats_df <-
d %>%
dplyr::group_by( cbsaname ) %>% # for each CBSA name,
dplyr::summarize( metro.mhv.change = median( mhv.change, na.rm=T ), # find the median `mhv.change` value
metro.mhv.growth = 100 * median( mhv.growth, na.rm=T ) ) %>% # and find the median `mhv.growth` value
dplyr::ungroup() # ?
I ran the code with and without the ungroup()
function, but don't see any difference in the output, so I'm not sure of what ungroup()
is doing 🤔
do we need to store the lab_04_source.R
file under labs or can we have it inside our individual folder within labs/wk04/
?
# load necessary functions and objects ----
# note: all of these are R objects that will be used throughout this .rmd file
import::here("S_TYPE",
"panel.cor",
"panel.smooth",
"jplot",
"d",
"df",
"cbsa_stats_df",
# notice the use of here::here() that points to the .R file
# where all these R objects are created
.from = here::here("labs/lab_04_source.R"),
.character_only = TRUE)
In the lab 4 tutorial, I see:
hist( df$MedianHomeValue2000, breaks=200, xlim=c(0,500000),
col="gray20", border="white",
axes=F,
xlab="MHV (median = $138k)",
ylab="",
main="Median Home Value in 2000 (2010 US dollars)" ) <==
What does the main=".. (2010 US dollars)"
mean?
Thank you! 😄
Hi Archana,
dplyr::ungroup()
function and come back with questions if the documentation is unclear. "Grouped" data frames behave a little differently than regular data frames.hist()
function, specifically for the main
parameter. It's input is a string so we know that its contents are only meant to be displayed back to the user.
main
allows the reader to know that the author has accounted for inflation (i.e. 2000 is the year but we've inflated their $ values to be 2010 year values so that we can truly 2010 prices to 2000 prices).Overall, I do think it's a good thing you're going line by line to make sure you are comfortable with these .R
files. When faced with specific questions like "how does this work?" or "what happens if I change this?", it is often best to try it first and then bring questions back to your team lead about why something didn't work as you expected.
Hi @cenuno , ok I will check the documentation for ungroup()
and other questions first
For the 3rd question, I should have been more clear, sorry
I meant to ask what does '2010 US dollars' mean in the context of 'Median Home Value in 2000'? But I see that you have edited your comment and answered my question there, so thank you!
Hi @cenuno ,
In the lab 04 source file that you provided at https://github.com/DS4PS/cpp-528-spr-2021/blob/main/labs/lab_04_source.R , it says
do we need to filter
rural
orurban
districts?