Closed rpseely closed 9 months ago
We kept 65 variables from the dataset. We kept the important ones that will allow us to do our hypothesis, like the date the respondent was interviewed, the year the respondent was interviewed, mother industry, father industry, and respondent industry. The rest of the variables that we kept were ones that could be moderators of the effects seen. For example, we kept variables like wealth, income, hours worked per week, opinions on class, etc. in case we have time to see how some of these variables interact with the main ones that we will use for the hypothesis test.
The rest of the variables that we kept were ones that could be moderators of the effects seen.
We forgot to keep gender of the respondent! We will have to do update our cleaned dataset that is currently named GSSclean.noexcel.dta
because this will be an important moderator effect and something that Professor Zhu wanted to see as well.
recessiondates.dta
and...Something I just thought of... we need to check the years that our income variable is therefore because I think it might only be for certain years so we should browse the data.
The variables we had, e.g. yearly
, were only for one survey year. The variable we have kept in our do-file and in our new dataset to reflect income is now rincome
and spans all the years we are examining.
GSSclean.noexcel.nogender.dta
with an updated dataset.GSSclean_noRDs.dta
. GSSclean_noRDs.dta
to GitHubfulldate
variable so that respondents can be labeled as having been interviewed during a recession or not.
- We are also working on creating a
fulldate
variable so that respondents can be labeled as having been interviewed during a recession or not.
As suggested by Dylan, we can go back to the original excel sheet and manually input the dates into a full date
format, and then convert that into a .dta file and replace the current recessiondates.dta
fulldate
format, i.e. mmddyy. recessiondates.dta
file.Something I just thought of... we need to check the years that our income variable is therefore because I think it might only be for certain years so we should browse the data.
@ecn310/nepobabies When you get answers to these kinds of questions, where are you documenting those answers?
@ecn310/nepobabies When you get answers to these kinds of questions, where are you documenting those answers?
@kbuzard It was never written down, but the correct variable for income spanning all years rincome
is in the do-file from when we replaced the old cleaned dataset with a new one containing the gender and age variables, as well. That is why I checked it off but obviously that does not reflect that the issue was solved.
I will go back and update that comment.
- [x] Check data for years in which income is available with the variables we kept
- [x] Update income variable
The variables we had, e.g.
yearly
, were only for one survey year. The variable we have kept in our do-file and in our new dataset to reflect income is nowrincome
and spans all the years we are examining.
New issue with the income variable:
While rincome
does span all of the survey years we plan to look at, something we need to look at is how exactly it is coded, because as it stands, that variable claims the median income of all respondents surveyed between 1974 and 2022 is somewhere between $10,000 and $14,999. Obviously, there is something very wrong with this or with my interpretation. I will continue to search for a proper income variable that accurately measures income and spans our survey years.
I will continue to search for a proper income variable that accurately measures income and spans our survey years.
I found two variables, realinc
and coninc
, and they are both described in Stata as "family income in constant dollars."
The two variables have different values for each respondent, and coninc
always appears to be higher (just by examining it visually, not by actually running code). Next week in class or during office hours, we can seek help from Professor Buzard or Dylan to understand the difference between the two variables and which would be better to use.
I am also having trouble finding the correct codebook for our pooled cross-section data (1972-2022) on the GSS documentation website, so that is something we also need help with.
yearsjob
, that measure length of employment.unemploymentrate.monthly.xslx
coninc
and realinc
is that income was measured in real dollars and chained to different years, creating those two variable. We should divide the two variables and if we get a constant ratio, then we will know that this is likely the case.unemployrate
Prof. Buzard said that it is likely that the difference we see between coninc and realinc is that income was measured in real dollars and chained to different years, creating those two variable. We should divide the two variables and if we get a constant ratio, then we will know that this is likely the case.
Prof. Buzard was right! They are both variables for family income with different base years. I decided to use realinc
with a base year of 1986.
November 6, 2023