Finalize Dataset - Githubissues

rpseely commented 1 year ago

November 6, 2023

[x] We must remove all of the unnecessary variables in our dataset that.
[ ] We must merge the cleaned dataset with the excel sheet of recession dates creating a dummy variable for recession.

rpseely commented 1 year ago

November 10, 2023

We kept 65 variables from the dataset. We kept the important ones that will allow us to do our hypothesis, like the date the respondent was interviewed, the year the respondent was interviewed, mother industry, father industry, and respondent industry. The rest of the variables that we kept were ones that could be moderators of the effects seen. For example, we kept variables like wealth, income, hours worked per week, opinions on class, etc. in case we have time to see how some of these variables interact with the main ones that we will use for the hypothesis test.

rpseely commented 1 year ago

The rest of the variables that we kept were ones that could be moderators of the effects seen.

We forgot to keep gender of the respondent! We will have to do update our cleaned dataset that is currently named GSSclean.noexcel.dta because this will be an important moderator effect and something that Professor Zhu wanted to see as well.

[x] Go back and open original GSS dataset, keep all the other variables and include gender of respondent
[x] Replace old cleaned dataset with new cleaned dataset.

rpseely commented 1 year ago

[ ] Create the dummy variable within the stata dataset we created called recessiondates.dta and...
[ ] Then merge it with the cleaned GSS data

rpseely commented 1 year ago

Something I just thought of... we need to check the years that our income variable is therefore because I think it might only be for certain years so we should browse the data.

[x] Check data for years in which income is available with the variables we kept
[x] Update income variable

The variables we had, e.g. yearly, were only for one survey year. The variable we have kept in our do-file and in our new dataset to reflect income is now rincome and spans all the years we are examining.

rpseely commented 1 year ago

November 16, 2023

We updated the clean dataset to include respondent gender, respondent age, wage variables that span all the survey years we are going to analyze, and a few additional moderator variables.
We added that command to our do-file
We still need to merge the recession dates and cleaned GSS datasets, but it is TBD whether we will create the dummy variable that labels dates as recessions before we merge them. We also need to use the new do-file to download the new dataset and upload it to GitHub, as we only created the do-file on Wednesday but did not use it to replace the current GSSclean.noexcel.nogender.dta with an updated dataset.

rpseely commented 1 year ago

November 17, 2023

We cleaned the dataset with all the variables we want and named it GSSclean_noRDs.dta.
This is now on OneDrive but should be added to GitHub.
[x] Add GSSclean_noRDs.dta to GitHub
We are also working on creating a fulldate variable so that respondents can be labeled as having been interviewed during a recession or not.

rpseely commented 12 months ago

November 21, 2023

We are also working on creating a fulldate variable so that respondents can be labeled as having been interviewed during a recession or not.

As suggested by Dylan, we can go back to the original excel sheet and manually input the dates into a full date format, and then convert that into a .dta file and replace the current recessiondates.dta

[ ] Manually input dates into fulldate format, i.e. mmddyy.
[ ] Convert new excel file into .dta file.
[ ] Upload the new .dta to GitHub and replace the current recessiondates.dta file.

kbuzard commented 12 months ago

Something I just thought of... we need to check the years that our income variable is therefore because I think it might only be for certain years so we should browse the data.

@ecn310/nepobabies When you get answers to these kinds of questions, where are you documenting those answers?

rpseely commented 12 months ago

@ecn310/nepobabies When you get answers to these kinds of questions, where are you documenting those answers?

@kbuzard It was never written down, but the correct variable for income spanning all years rincome is in the do-file from when we replaced the old cleaned dataset with a new one containing the gender and age variables, as well. That is why I checked it off but obviously that does not reflect that the issue was solved.

I will go back and update that comment.

rpseely commented 12 months ago

November 22, 2023

[x] Check data for years in which income is available with the variables we kept

[x] Update income variable

The variables we had, e.g. yearly, were only for one survey year. The variable we have kept in our do-file and in our new dataset to reflect income is now rincome and spans all the years we are examining.

New issue with the income variable: While rincome does span all of the survey years we plan to look at, something we need to look at is how exactly it is coded, because as it stands, that variable claims the median income of all respondents surveyed between 1974 and 2022 is somewhere between $10,000 and $14,999. Obviously, there is something very wrong with this or with my interpretation. I will continue to search for a proper income variable that accurately measures income and spans our survey years.

[x] Update income variable (again)
[x] Update the cleaned dataset with the correct income variable.

rpseely commented 12 months ago

November 22, 2023 (continued)

I will continue to search for a proper income variable that accurately measures income and spans our survey years.

I found two variables, realinc and coninc, and they are both described in Stata as "family income in constant dollars." The two variables have different values for each respondent, and coninc always appears to be higher (just by examining it visually, not by actually running code). Next week in class or during office hours, we can seek help from Professor Buzard or Dylan to understand the difference between the two variables and which would be better to use.

I am also having trouble finding the correct codebook for our pooled cross-section data (1972-2022) on the GSS documentation website, so that is something we also need help with.

rpseely commented 12 months ago

November 23, 2023

In response to the discussion we had with Professor Buzard on the GSS employment length data issue, we will need to go back to the original GSS dataset provided by Professor Zhu and see if there are any variables, more clear/useful than yearsjob, that measure length of employment.

rpseely commented 11 months ago

November 27, 2023

I started working on an unemployment rate dates dataset as suggested by Prof. Buzard. It is linked in the OneDrive and titled unemploymentrate.monthly.xslx
I am not going to continue doing the data-entry until Prof. Buzard checks it to see if it is correctly formatted. Also, can we find this dataset somewhere so that we do not have to do so much data entry?

rpseely commented 11 months ago

November 27, 2023

Notes from Office Hours with Professor Buzard

Prof. Buzard said that it is likely that the difference we see between coninc and realinc is that income was measured in real dollars and chained to different years, creating those two variable. We should divide the two variables and if we get a constant ratio, then we will know that this is likely the case.

rpseely commented 11 months ago

December 3, 2023

We must have unemployment rates for every month from 1986-2022.
That is, if we plan to only use data for young adults, which if we define as being younger than 30, leaves us with 2,021 observations and 142 nepobabies.
So far, I have been doing this manually, using the excel sheet from the BLS as linked in the [OneDrive](https://sumailsyr-my.sharepoint.com/personal/rhrabino_syr_edu/_layouts/15/onedrive.aspx?ct=1696616219053&or=OWA-NT&cid=87642c6f-751b-bfc3-8c20-489c586e0930&fromShare=true&ga=1&id=%2Fpersonal%2Frhrabino_syr_edu%2FDocuments%2FECN 310 Project), and using the replace function to fill in values for unemployrate

rpseely commented 11 months ago

December 7, 2023

Prof. Buzard said that it is likely that the difference we see between coninc and realinc is that income was measured in real dollars and chained to different years, creating those two variable. We should divide the two variables and if we get a constant ratio, then we will know that this is likely the case.

Prof. Buzard was right! They are both variables for family income with different base years. I decided to use realinc with a base year of 1986.

ecn310 / course-project-nepobabies

Finalize Dataset #13

November 6, 2023

November 10, 2023

November 16, 2023

November 17, 2023

November 21, 2023

November 22, 2023

November 22, 2023 (continued)

November 23, 2023

November 27, 2023

November 27, 2023

Notes from Office Hours with Professor Buzard

December 3, 2023

December 7, 2023