Summary – be more precise with the first two sentences.
Remove noble.
Split up the third sentence of the overview.
Add contact details
Sketches not done.
Folders/files that are not being used should be removed.
Simulation not yet done.
Add more comments, especially around “magic numbers” such as lines 21 and 22 of 03-clean_data.R.
Don’t forget to update date in the scripts, if necessary.
Clean up the libraries that are being loaded in the Quarto doc.
Move cleaning that is happening in the Quarto doc into a cleaning script. Consider using parquet for this analysis dataset so that it maintains class. Then read it into the Quarto doc.
Title – I’d like to see the years and the country.
Sub-title: Add main finding.
General – try to be more concise with writing.
Figure 2 – remove “Sex” as title.
Figure 2 – consider also plotting the difference
Figure 3 – reduce the number of categories.
In general, try to never say what the “thing” is in text. E.g. “Figure 5, the Histogram demonstrates…” should just be “Figure 5 demonstrates…”.
“We leverage the” change to “We use a”
Brilleman et al (2018) should be (Brilleman et al 2018).
[x] Correct the "README" to the repo by adding when and where to the title, to clarify what population we are looking at.
[ ] Correct the "README" to the repo by being more precise in the introduction. Be more specific about the context of our study and how it may fit into wider work.
[x] Our commit notes look okay.
[x] Correct the title to our repo to more accurately reflect our research. (Sex, education, mental health).
[x] Add sketches to folder.
[ ] Remove folders and files that we are not using, to tidy up the repo.
[ ] We don't have simulation data - we should do it. We only have clean data (03-clean_data.R). Our cleaning makes sense to Rohan now, but in six months we won't remember what our numbers and variables are thoroughly so we know what we did. At a metalevel, it makes your codes very understandable and sharable so others can understand what you did and why, without needing to have you there to tell them what you did and why.
[ ] Don't forget to update the date in the scripts, if necessary.
[ ] Clean up the libraries that are loaded in our Quarto doc, for transparency. For example, we don't need library(gutenbergr) - in future, if we try to run our paper again, and library(gutenbergr) is broken, we don't want our paper to break just because gutenbergr is broken/not supported.
[ ] We did some data cleaning inside the Quarto, which we can remove from Quarto and put into a data cleaning file seperately, so as to not clutter the Quarto. E.g. lines 38 through 70s can be outside of Quarto. This also allows us to test the data.
[ ] When we have around 10 or more "educ="501~"Master's"" lines (where we re-map the names in the table), it gets tedious quickly so create a lookup table (csv) to make it reference-able.
[ ] In the title of the paper (qmd), like we saw in the repo's README, Rohan would like to see the years and the country again, because America tends to assume it is universal but it is not. Also, the title is good, but to make it better use a subtitle and specify your main finding. E.g. "Gender and blahblah: discovery of relationships between blahblah from blahblah variables."
[ ] In text-heavy sections of the qmd, make it more concise. Cut out unnecessary words like "we hope to..."
Rohan asks, on the bottom of page 2 and top of page 3, why do we have these tables? (We explained it to Rohan, but having this written out could be clearer for the reader).
[ ] Graph at bottom of page 5 has the word 'sex' on top (looks sort of like the title of the graph). Remove it as it is confusing/not helpful.
[ ] For that same graph (Figure 2), add another graph to include difference/changes between 2018 and 2021.
Figure 3: smash some more categories together, e.g. highschool + GED, master's + doctoral (postgraduate)... reduce it down, suggests Rohan.
[ ] The description of our Figure 5 can be simplified to "Figure 5 demonstrates the overview of the age data distribution..." (cut out extra words "Figure 5 The histogram demonstrates the overview of the age data distribution..."). In general never say what type of "thing" is in the figure (histogram).
[ ] When we say in section "3 model", change "we leverage" to "we use".
[ ] Clean up the citations in section "3 model" - Quarto document didn't output the citations correctly.
[ ] Expand out section "5.3 weaknesses and next steps." E.g. include more details about connections to other research in the field and what we can contribute to it...
[ ] Expand appendix.
[ ] Citation in bibtex to the APA paper current has no author listed, even though we know it is the APA - add APA as the author please.
[ ] Per our lecture today, start adding tests for validity, internal consistency, and external consistency. Include that as a script in your project repos. See our lecture notes the the textbook for the packages to use (there are three that are possible to use.)
[ ] Good job testing our plan on a small sample of 5000 before doing a bigger scale.