ecn310 / course-project-nepobabies

course-project-nepobabies created by GitHub Classroom
0 stars 0 forks source link

Corrections to final project for use as sample #18

Open rpseely opened 4 months ago

rpseely commented 4 months ago

Tracking Corrections to Final Project

As instructed by Professor Buzard, I will be making some edits to the final report write-up and the do-file for potential use as a sample project in future sections of ECN 310. I will be using the critique from the grading rubric used by Prof. Buzard to guide my editing of the project.

rpseely commented 4 months ago

Saturday, February 10, 2024

rpseely commented 4 months ago

Sunday, February 11, 2024

Note: These changes and notes on errors are all on my iPad and have not yet been edited into the .tex document.

rpseely commented 4 months ago

Monday, February 19, 2024

Hire month estimation explanation: Using the job length variable and the GSS interview date variable, we estimated the month and year that a respondent was hired. The estimated month of hire was created by taking the interview date and subtracting the number of years of employment at the respondents current job times 12. This means that the estimated month of hire is always the same month as the month a respondent was interviewed (unless a respondent had been employed for less than one year). This is due to the responses to the job length question only counted in one quarter of a year, three quarters of a year, and then years in counting numbers starting from one.

rpseely commented 4 months ago

Monday, February 19, 2024 (continued)

Update: all minor changes, i.e. spelling, conventions, word choice (non-substantive), have been updated from iPad PDF to .tex file.

kbuzard commented 4 months ago

Should I explain that in the data section, and then maybe in the discussion explain that 30 years old is not a perfect cutoff?

I think this is a good plan!

On the explanation of the "hire month" variable: what you have is a good start, but I agree it could be clearer. It doesn't necessarily have to be a lot shorter. In particular, you can integrate a discussion here of the windows of time you used to figure out when the hire month was during a bad labor market, and any robustness checks to ran to deal with this (or robustness checks you think should be run even if you're not going to include them in this report).

I want to also go back and edit the bibliography. I wonder if there is a way to just correctly format the references correctly (hanging indents, double spaced?) without having to make them like overleaf/LaTeX sources.

You can do this for sure; google/chatgpt can help you out. I'm never sure whether it is more frustrating to make and implement a .bib file or to try to get LaTeX to format things the way I want them. It's probably six to one, half a dozen to the other. It will look more impressive if done with a .bib file, but I don't think it's necessary.

it will be easier/quicker to discuss in person, so maybe we can set up some time this week/next

Sure! Let me know what will work for you. I'm going to be on campus for at least part of each of the next three days, and I'm also happy to Zoom (we could even just stay on after the meeting tomorrow night).

rpseely commented 4 months ago

Wednesday, February 21, 2024

@kbuzard and I discussed edits to the analysis section after the team meeting today. Here is what we came up with:

1. Definition of month hire conversation:

2. Quartiles categorized by unemployment level and t-tests

3. Chi-square vs. t-test

4. Graphs/Tables

5. Gender dynamics analysis

rpseely commented 4 months ago

Monday, February 26, 2024

After speaking with Professor Buzard, we made the determination that the updates to the project should be done right around spring break!

rpseely commented 4 months ago

Tuesday, February 27, 2024

rpseely commented 3 months ago

Tuesday, March 5, 2024

Editing frequency/proportion table

Screenshot 2024-03-05 at 7 30 09 PM

Nepobaby sex vs. Nepoparent sex Figure

Sensitivity analysis

rpseely commented 3 months ago

Tasks as of March 5, 2024

I want to organize what I still have to do at this point...

Sensitivity analysis

Gender dynamics

T-Test vs. Chi-Square

Fix up code

Data Section

References

Final Product

kbuzard commented 3 months ago

I started by changing the frequency table to include the proportions, as discussed. I would appreciate your feedback, @kbuzard, on how you think it looks as is.

@rpseely I think it looks good! My only suggestions are to

  1. replace "Sample" with something like "# Observations". This is just the more common usage.
  2. think about changing the term "hire group". It's just not obvious what this means if someone glances at the chart. Because many people skim papers by looking at the figures, it's always a good idea to explain everything in the chart, even if it requires notes beneath or something similar.

I believe the best course of action would be to make different copies of the FRED unemployment rate data. In each FRED dataset I would create a variable called ymhiredate_3m (for minus three months) or ymhiredate_6p (plus six months, and son on), except I would create a corresponding variable before merging with the same name, as I did for the original analysis. @kbuzard Let me know how you feel about this! This is something I could do over the weekend.

Is what you're thinking about a single dataset with multiple variables, where column is "offset" by 3 or 6 months? If so, I think that makes sense.

rpseely commented 3 months ago
  1. think about changing the term "hire group". It's just not obvious what this means if someone glances at the chart. Because many people skim papers by looking at the figures, it's always a good idea to explain everything in the chart, even if it requires notes beneath or something similar.

Would putting "Unemployment Level at Time of Hiring" be good for the title? And then in the chart I am not sure what else would be short enough and offer enough explanation, so I think adding in a note beneath would be best.

Is what you're thinking about a single dataset with multiple variables, where column is "offset" by 3 or 6 months? If so, I think that makes sense.

Hmmm. That could work, too. My thought was to essentially create multiple copies of the FRED unemployment rate dataset that we used to merge the unemployment rates in, and then each copy of the data would have the month offset and the corresponding ymhiredate_xx variable. I will see if I can whip one of these up before our meeting to show it.

I actually just understood your idea with the other approach! I think I could just go back to the original unemployment rate dataset and create multiple columns with the correct offset. I will try that, first.

kbuzard commented 3 months ago

Would putting "Unemployment Level at Time of Hiring" be good for the title? And then in the chart I am not sure what else would be short enough and offer enough explanation, so I think adding in a note beneath would be best.

@rpseely This is a great title! You could potential just not use the short descriptor if it's possible to leave that upper left box blank. If you have a good title and everything is otherwise well described, I think it will be clear this this is the only thing you're analyzing in the table.

rpseely commented 3 months ago

Saturday, March 9

Sensitivity Analysis

Tried it with manipulating excel sheet but hit a roadblock

Explanation of why I don't think it can work using one excel sheet like so

Screenshot 2024-03-09 at 11 35 47 PM

gen ymhiredate = ymintdate - (yearsjob*12)

merge m:m ymhiredate using "C:\Users\rpseely\OneDrive - Syracuse University\Documents\GitHub\exercises\course-project nepobabies\FRED_unrate_60to22_robust.dta"

Trying to code it out

gen unemployrate_m3 = unemployrate of ymhiredate - 3

bysort ymhiredate: egen unemployrate_m3 = mean(unemployrate[_n-3]) if _n > 3

bysort ymhiredate: gen unemployrate_m3 = unemployrate[_n-3]

Tentative success?

kbuzard commented 3 months ago

I gave it one last go using an excel sheet where I manually made the ymhiredate 3 months prior to the current one, as shown above. The difference is that it only had the ymhiredate_m3 and the unemployrate variable (labeled as unemployrate_m3 for clarity)/ At first it did not work because I had a line of code that negated this (gen ymhiredate_m3 = ymhiredate - 3). Then I removed this line of code and simply set ymhiredate = ymhiredate_m3. Then I re-ran the rest of the code, and I believe the code run the robustness check I intended to! Unless Professor Buzard has any qualms with this method, I will go forward with the same method for the sensitivity analysis for minus 6 months, plus 3 months, and plus 6 months. The p-value was 0.089, so it does not pass at the 0.95 significance level, but there may still be something going on there. Definitely something to make note of, especially in light of the results of the next few parts of the robustness check.

This sounds like what I was envisioning!

rpseely commented 3 months ago

Monday, March 11

Sensitivity Analysis

References

\newpage \section*{Bibliography} \singlespacing \setlength\bibsep{1pt}

\bibliographystyle{plain} \bibliography{nepobabiesreferences}

\end{document}

@kbuzard Do you have any advice on this? Or any resources you have found helpful?

kbuzard commented 3 months ago

Do you have any advice on this? Or any resources you have found helpful?

Some questions to start:

kbuzard commented 3 months ago

@rpseely I forgot to tag you in above post. Not sure you'd get a notification, so here's one for sure!

rpseely commented 3 months ago

Tuesday, March 12

Editing Final Report

References

@kbuzard Thank you! I tried all of those suggestions (and combinations of them) and I have not been able to get it to work.

  • Do you get any error message?

Yes, I get one error message related to the bibliography. It states:

Package natbib Warning: Empty `thebibliography' environment on input line 3.

I ChatGPT'd what this means, but nothing that ChatGPT says might be wrong is apparent to me.

kbuzard commented 3 months ago

@rpseely Okay, given that error, I think I might know what's going on. Check out this answer to this question for details.

the \bibliography command ONLY prints the references for papers that are cited in the paper. And by "cited,' I mean programmatically with the \cite{} command or similar. My best guess is that you've hard coded the references in the body, but natbib doesn't see them as references.

rpseely commented 3 months ago

the \bibliography command ONLY prints the references for papers that are cited in the paper. And by "cited,' I mean programmatically with the \cite{} command or similar. My best guess is that you've hard coded the references in the body, but natbib doesn't see them as references.

This was exactly right! I must go back into the overleaf file and put in the \cite{} commands in, within the literature review, and then it should correctly compile the bibliography. I also want to change the citation style.

rpseely commented 3 months ago

Wednesday, March 12

Results Error

Going forward

kbuzard commented 3 months ago

@rpseely Well, now you've truly had the research experience! I don't mean to sound flip...this is just the kind of thing that happens all the time. It's really frustrating, but it's just life in this business.

The first thing I would suggest is kicking the tires a little bit. With no context, a would be a little surprising that dropping the 2022 data would overturn your results if that was just one fifth of the sample. I guess this could be because u-rates were really low in 2022, and so your data said that all these people weren't nepobabies. But I think it's worth digging into this a little bit to make sure you really believe the new result.

The next thing I'd try after that is to make a scatter plot with the following: unemployment rates on the x-axis, and the percent of nepobabies in each of those unemployment-rate groups (so what % of people are nepobabies when the unemployment rate is 5.0? What about 5.1? What about 5.2, etc...

rpseely commented 3 months ago

Tuesday, March 19

Results Error Exploration

rpseely commented 3 months ago

Tasks as of March 24, 2024

I want to organize what I still have to do at this point...

Sensitivity analysis

Gender dynamics

T-Test vs. Chi-Square

Fix up code

Data Section

References

Final Product

Correcting results error

I think going through the literature again is the most important part of telling the narrative, because then we could rely on some prior sources that make similar claims that we do (kids always rely on parents). And then of course making the results match up with the new analysis.

rpseely commented 3 months ago

Sunday, March 24, 2024

rpseely commented 3 months ago

Monday & Tuesday, March 25 & 26, 2024

rpseely commented 3 months ago

Wednesday, March 27, 2024

Introduction Language

@kbuzard We make this statement in our introduction:

"This struggle for young professionals will incentivize them to use all of their resources to find a job. Therefore, we predict that there will be more graduates joining their parent’s occupations under these conditions."

My question is, is it appropriate to make this statement because it conflicts with our results? i.e. does it just add confusion, or is it notable that the results don't match our prediction.

Additionally, I updated the language in the introduction and abstract section to reflect that we can find no significant association.

Updating Data Section

Results section

kbuzard commented 3 months ago

My question is, is it appropriate to make this statement because it conflicts with our results? i.e. does it just add confusion, or is it notable that the results don't match our prediction.

I think it's more honest to stick with your original hypothesis, and frankly, more interesting a story. I'd keep it!

kbuzard commented 3 months ago

@kbuzard Also, I assume I am good to cite nepotism literature from other fields, specifically social psychology? I believe I have a good article from a social psychologist that I would use to argue that desires to create nepotistic relationships (from the parent and from the child) are strong, such that differences in labor market competition are not strong enough to override that use of resources. I won't state it so strongly in the final draft, but that is the idea.

This is fine. These types of papers often have a few cites from related fields, just as many of political economy papers often have a few cites from political science journals.

rpseely commented 3 months ago

Sunday, March 31, 2024

Replicating beta regression

I fixed the way the beta regression was being performed and had some issues. I realzied that the way I defined the nepobaby ratio was not really doing what I wanted it to, so I redefined correctly (to be the ratio of nepobabies to nonnepobabies for each unemployment rate/hiremonth). I definitely want to come back to this, but for now I am just going to focus on re-writing the results section. I uploaded the code I used into the nepobabies.do file. The issue with replicating the beta regression, more specifically, was that I would get an error message saying there were no observations - this is obviously not true as I check it a bunch of times. I then ran a regular regression but the sum of squares for the residual was much much higher than that for the model (0.265 for the model, 61.307 for the residual). I am sure part of that was also using the regular regression and not the beta regression, and part of it is due to the fact that there is no significant association. Once again, I want to come back to this at some point.

I decided to come back to it and try again! I think it definitely worked, but with the newly, and I believe correctly, defined nepobaby_ratio variable, I got a significant result. However, the significant result is the opposite of what our prediction is (that with lower unemployment rates there are higher nepobaby ratios). Hmmm, definitely going to have to think about this more. Not sure what to do with this for now.

Here is the output:

Screenshot 2024-04-01 at 12 26 35 AM

@kbuzard I think that the beta regression is a cool idea, but I don't want to keep going with it if it's not really supporting me in my end goal. But I also don't want to ignore it if we think it's notable. I am also happy to meet if we want to go in depth. I really want to create a valid, replicable, accurate piece of research, but I don't want to spend time that might be more fruitful doing data entry. I'm also a little out of my depth with the regression analysis, but I do think that might give a more accurate result than a chi-square analysis that cuts the unemployment rates into only four groups (just from my understanding of regression being a more powerful tool for data analysis). Let me know what you think when you get the chance. I will keep on with the data entry until we can come up with a plan on how to get this project done well.

kbuzard commented 3 months ago

I think that the beta regression is a cool idea, but I don't want to keep going with it if it's not really supporting me in my end goal. But I also don't want to ignore it if we think it's notable. I am also happy to meet if we want to go in depth. I really want to create a valid, replicable, accurate piece of research, but I don't want to spend time that might be more fruitful doing data entry. I'm also a little out of my depth with the regression analysis, but I do think that might give a more accurate result than a chi-square analysis that cuts the unemployment rates into only four groups (just from my understanding of regression being a more powerful tool for data analysis). Let me know what you think when you get the chance. I will keep on with the data entry until we can come up with a plan on how to get this project done well.

My problem is that I don't know anything about beta regression. Maybe there's a simpler way: if you do a one-independent-variable regression (with no constant), the coefficient you get should be the correlation coefficient. So maybe just a pwcorr (with ,sig option) would do what you're hoping for?

I also find it a little hard to think about $\frac{no. \ nepobabies}{no. \ non-nepobabies}$; $\frac{no. \ nepobabies}{total workers}$ is more the way I'm used to seeing such ratios. It shouldn't affect the significance, but it has a more natural interpretation.

rpseely commented 3 months ago

Tuesday, April 2, 2024

Data Analysis (Issues Resolved!)

rpseely commented 2 months ago

Monday, April 8, 2024

Analysis Final Update (fingers crossed)

rpseely commented 2 months ago

Tuesday, April 16, 2024

Do-File

rpseely commented 2 months ago

Wednesday, April 17, 2024

Results Additions

rpseely commented 2 months ago

Tuesday, April 24, 2024

Writing Updates

rpseely commented 1 month ago

Saturday-Monday, May 13

Writing Updates

rpseely commented 1 month ago

Sunday May 19

Writing Updates

rpseely commented 1 month ago

Monday, May 20

Last substantive issue (hopefully) resolved!

kbuzard commented 1 month ago

@rpseely I think you can report this null result. I wouldn't make a big deal of it, but a very short paragraph that says something like, "One might expect that the reliance on ..... would differ across ....."

rpseely commented 1 month ago

Friday, May 24

Update and Recompile Overleaf .tex File

rpseely commented 1 month ago

Sunday, May 26

Final Edits (up until results)

rpseely commented 1 month ago

Tuesday, May 28

Uploaded Final Draft for Review

@kbuzard I have uploaded the final draft! I finished going through the draft to make edits and I have uploaded both a .tex file and compiled the report into a PDF as well and uploaded that.

kbuzard commented 1 month ago

@rpseely Great! I'll see if I can carve out time to read it over tomorrow! Thanks so much!

kbuzard commented 1 month ago

@rpseely I've just read through the report, and I think it's very, very close to the finish line. I've made some minor comments throughout the PDF (I just uploaded a copy with my initials attached). It really needs a pointer to your reproducibility package in a data appendix or either the data or results section. And there are some inconsistencies in the story (in various places you say very different things about how much support you find for/against the null hypothesis). I think my comments on the draft can be addressed in an hour or less. Another hour or two shaping up the reproducibility package would also be great (see notes on draft).

rpseely commented 1 month ago

@kbuzard I am not sure what happened but for some reason I uploaded the updated .tex file and a .pdf file that was out of date, even though I thought I got them both from overleaf together? In the updated file, I have those inconsistencies addressed, as well as other minor changes but nothing else that is substantively different, so the rest of the comments are certainly applicable.

Also, I ran some quick tests on whether the low unemployment rate group was significantly different from the three higher groups and whether it was significantly different from just the higher group and there was not enough evidence to reject H0 for either test (see no.diff_highlow.urate.log).

kbuzard commented 4 weeks ago

@kbuzard I am not sure what happened but for some reason I uploaded the updated .tex file and a .pdf file that was out of date, even though I thought I got them both from overleaf together? In the updated file, I have those inconsistencies addressed, as well as other minor changes but nothing else that is substantively different, so the rest of the comments are certainly applicable.

@rpseely Maybe we could schedule a Zoom once the project is wrapped and you could help me brainstorm about ways to make the workflow easier for this coming fall's version of 310? I'm considering paying for a premium Overleaf account so that people can use the integration with Github, but I've never tried it so I don't know if it will be easier or harder than what we did this fall.

Also, I ran some quick tests on whether the low unemployment rate group was significantly different from the three higher groups and whether it was significantly different from just the higher group and there was not enough evidence to reject H0 for either test (see no.diff_highlow.urate.log).

I suggest adding a sentence or two about these new tests, giving the p-values and saying that they are not significant at conventional levels but are close.

rpseely commented 4 weeks ago

Monday, June 3, 2024

Changes made after comments

  1. I changed the way that I wrote about the nepobaby and parent gender relationships and created a new bar graph to reflect this language and more accurately reflect the content of the data I have.
  2. I added in language about the two new chi-square tests that show no significant association between nepotism status and unemployment level for bottom 25% to top 75% and bottom 25% to top 25% (at conventional levels).
  3. Various parts where there were minor errors/changes in language. I want to note that I, importantly, corrected the correlation coefficient from 0.979 to 0.0979.
  4. Confusing phrasing in lit review...

From Annotated PDF:

Women, who are described as risk-averse by Hellerstein and Morrill (2011) may follow into their parents professions, specifically their father’s occupation to prevent the risk of being jobless after their graduation. While we do not present evidence on this trend, our finding that approximately ten percent of all young adult workers surveyed are defined as nepobabies demonstrates that a large portion of the workforce utilizes their familial relationships to establish careers and does provide evidence of this potentially risk-adverse behavior.

I have deleted the bolded and italicized portion. I was intending to reference the growing trend of daughters working in their fathers profession (from Hellerstein and Morrill), but a) this does not document a theory and b) we do not analyze the changes in parent-child nepotism over time.

Changes made before comments (not part of annotated pdf)

  1. The abstract was largely rewritten after the second sentence.
  2. Paragraph 4 of results: There were a number of errors/confusing phrasing that were commented on that I had altered. This is also where I put the writing on the new chi-square tests.

Question

@kbuzard There is a portion of writing in the data section that addresses how we determined to subset the data in terms of age, specifically it includes notes on a logistic regression and chi-square analysis. My thought in putting that in the data section was to explain why I chose to only focus on the data from observations in which the respondent was younger than 30. Here is my idea: copy and paste the section you highlighted somewhere into the beginning of the results section. Then, add a sentence in the data section where that text was and say something like "we choose to focus on adults younger than 30 and you can read all about why in \ref{sec:result}" so that one can jump to the explanation of why we made that cutoff. Would that be more appropriate?

Next Steps

I believe all of the comments have been addressed so now I will focus on the reproducibility package.

rpseely commented 4 weeks ago

@rpseely Maybe we could schedule a Zoom once the project is wrapped and you could help me brainstorm about ways to make the workflow easier for this coming fall's version of 310? I'm considering paying for a premium Overleaf account so that people can use the integration with Github, but I've never tried it so I don't know if it will be easier or harder than what we did this fall.

@kbuzard I would be happy to!