DS4PS / cpp-529-fall-2020

http://ds4ps.org/cpp-529-fall-2020/
0 stars 0 forks source link

Lab 5-Variables #17

Open malmufre opened 3 years ago

malmufre commented 3 years ago

Hello,

I am trying to check the variables I can choose from for lab 5 by using names(d) but I am getting them as the following and I am not able to understand what they mean. How can I find what each variable stands for?

[1] "tractid" "pop00.x" "nhwht00" "nhblk00" "ntv00" "asian00" "hisp00" "haw00" "india00"
[10] "china00" "filip00" "japan00" "korea00" "viet00" "mex00" "pr00" "cuban00" "hu00" etc....

Thanks

lecy commented 3 years ago

A data dictionary would be helpful!

The v00 designation means the measurement was from 2000.

Those variables are primarily the race / ethnicity / national origin categories:

"asian00" ASIAN "hisp00" HISPANIC "haw00" HAWAIIAN "india00" INDIAN "china00" CHINESE "filip00" FILIPINO "japan00" JAPANESE "korea00" KOREAN "viet00" VIETNAMESE "mex00" MEXICAN "pr00" PUERTO RICAN "cuban00" CUBAN

The full data dictionary is here:

https://ds4ps.org/cpp-528-fall-2020/data/LTDB-codebook.pdf

lecy commented 3 years ago

"nhwht00" NUMBER OF HOUSEHOLDS WHITE "nhblk00" ... BLACK

malmufre commented 3 years ago

Thank you Dr Lecy Is it fine to use certain variables even if they are not available in the year 2010? For example , In data dictionary there are certain variables that are available for all years except 2010.

lecy commented 3 years ago

Short answer is yes.

Longer answer is we are using the home value change from 2000 to 2010 as the dependent variable.

The independent variables would be predictors of that change, then.

Variables can be stocks or flows. For example, if we rain fall in 2000 versus total or average between 2000 and 2010.

Just make sure the predictor is capturing a stable characteristic of the tract.

malmufre commented 3 years ago

Got it thanks. I did not get however what we are supposed to do regarding this question:

Run two models - one with change in median home value (dollar amount) and one with median home value growth (percent change) from 2000 to 2010 as the dependent variables, and include your three year 2000 tract descriptors as covariates.

What I did so far is create correlation plots for the 3 variables that I picked so it gave me the how the coefficients and SE changed.

lecy commented 3 years ago

You are comparing two regression models. Both have the same covariates (predictors), but different dependent variables.

The task it to see which factors best predict change, but also note if they consistently predict both change in price and growth in price, or only one and not the other.

If a $1 million home increases by $20k and a $30k house increases by $20k it would be equal changes in the change in value model, but very different changes in the growth model ($20k is 2% of a million but and 66% of $30k).