Group 1 peer review - Githubissues

Summary: This paper explores the use of polling data to forecast the 2024 US Presidential election. The analysis leverages publicly available polling data from FiveThirtyEight and employs regression models to predict the outcomes for the major candidates, Kamala Harris and Donald Trump. The paper effectively introduces its dataset and presents visualizations of polling trends, but there are major areas for improvement, including the inclusion of a more detailed discussion section, better model explanation, and addressing the missing gaps such as the abstract and in-text citation. Overall, the paper shows promise but needs significant revisions and improvements as it is incomplete.

Strong Positive Points:

Use of Credible Dataset: The paper effectively utilizes a credible dataset from FiveThirtyEight, providing a reliable foundation for the analysis. The choice of dataset is appropriate given the task of forecasting the US election, and its broad coverage makes it well-suited for this kind of analysis.
Modeling Approach: The authors make a solid attempt at employing regression analysis to predict election outcomes, with a clear understanding of the importance of pollster reliability and other variables. The inclusion of Bayesian models is an excellent addition that demonstrates an advanced grasp of modelling.
Organized Structure: The paper is structured logically, with clear sections on data, variables, and results. This makes it easier for the readers to follow the progression and flow of the analysis. Additionally, the authors have made good attempt of using tables and figures to present their data.
Graphical Representation: The paper’s graphs and charts included in clear and easy to interpret.

Critical Improvements Needed:

Abstract: The abstract is missing from the paper. This is a critical section that provides the reader with a summary of what the paper is about, the methodology used, and the key findings. Understandably, this is only a draft and you have put filler phrases on the abstract, Without a proper and effective abstract, it would be difficult to understand the scope of the paper from the outset.
Discussion Section: The discussion is underdeveloped and lacks depth. It must reflect on the implications of the findings, discuss and expand the limitations in the dataset or methodology, and suggest avenues for future research. Expanding this section would add significant value to the paper by providing a more critical analysis of the results.
1. Model Explanation: While the Bayesian model is mentioned, the explanation of the model and its components is not detailed enough. I think that some clear mathematical notation and a more thorough explanation would go far in making the analysis of this section stronger.
README File: The README file in the GitHub repository is functional but could be improved to better guide users through the analysis process. It should provide a more detailed overview of the project’s structure and explain how to reproduce the analysis step-by-step.
Grammatical and Prose Improvements: There are some grammatical errors and awkward filler phrasings and words that detract from the professionalism of the paper. Although this is just the draft, for the final paper, proofreading the paper and improving the clarity and flow of the prose will help create a more polished and coherent final product.

Suggestions for Improvement:

In-Text Citations: The paper has citations, but they are not consistently linked to figures or specific claims in the text. Improving the consistency of in-text citations and ensuring that they are tied to the relevant data or figures will strengthen the academic rigor of the paper.
Expand on Data Limitations: The paper briefly mentions data limitations, but there is room to expand on this discussion It is quite raw and should be more informative. Specifically, it might be helpful to discuss potential biases in the polling data (e.g., under-sampling or over-sampling certain demographics) and how these biases might affect the election forecast.
Include More Variables or Alternative Models: As I mentioned above, to strengthen the analysis, I suggest in incorporating more variables related to voter behavior, such as regional differences or demographic factors. Additionally, comparing the results of different models could provide a more robust conclusion.
Real-World Implications: The real-world implications of the analysis could also be better addressed. For example, discussing how the findings could be used by political campaigns or how they might influence public policy would enhance the paper’s relevance.
Revise and Expand the Results Section: While the results are presented clearly, they need more interpretation. Discussing what the numbers mean in the context of the US election and how reliable the predictions are will add depth to this section.

Evaluation: ● Go/no-go 1: R is cited: (1 pt) ● Go/no-go 2: LLM usage is documented: (0/1 pt) ● Data are appropriately cited: (0/1 pt) ● Class Paper: (1/1 pt) ● Title: The title is functional but could be more descriptive by adding a subtitle like “A Poll-Based Analysis.” (1/2 pt) ● Author, Date, and Repo: Properly included. (2/2 pt) ● Abstract: No abstract present. (0/4 pt) ● Introduction: The introduction gives good context but lacks clear objectives and the estimand. (2/4 pt) ● Estimand: Not clearly stated in the introduction. (0/1 pt) ● Data: The data section is detailed, but there is limited discussion on cleaning or transformations. (6/10 pt) ● Measurement: Lacks detailed discussion on data measurement and how raw data translates to useful variables. (2/4 pt) ● Model: (3/10 pt) ● Results: (2/10 pt) ● Discussion: (4/10 pt) ● Prose: Grammatical issues filler phrases, but overall good structure. Needs proofreading. (2/6 pt) ● Cross-references: (1 pt) ● Captions: Captions are present but could be more detailed. (1/2 pt) ● Graphs/tables/etc: Good visualizations but captions need improvement for better interpretation. (3/4 pt) ● Idealized Methodology: The proposed polling methodology is not detailed enough. The paper lacks a well-thought-out explanation of how the idealized survey would be implemented. (3/10 pt) ● Idealized Survey: The idealized survey is missing; no link or survey content provided. (0/4 pt) ● Pollster Methodology Overview and Evaluation: A pollster is chosen but not deeply analyzed in terms of methodology, sampling, or reliability. (4/10 pt) ● Referencing: References are present but need to be fully formatted and consistently applied. (3/4 pt) ● Commits: Few commits, and the messages could be more detailed. (1/2 pt) ● Sketches: (2 pt) ● Simulation: Simulation is mentioned but not detailed enough. (2/4 pt) ● Tests - Simulation: No tests for the simulated dataset. (0/4 pt) ● Tests - Actual: No tests for the actual dataset. (0/4 pt) ● Parquet: The dataset is not saved as a parquet file. (0/1 pt) ● Reproducible Workflow: (2/4 pt) ● Miscellaneous: (1/3 pt)

Estimated Overall Mark: 0/126

Final Thoughts: The paper presents a promising foundation for analyzing the 2024 US election using polling data, but it is incomplete in several areas, including the abstract, discussion, and model explanation as it is draft. By addressing these gaps and providing a more thorough analysis of the results and data limitations, this paper could achieve a much higher score.

aj3616 / Forecasting-the-US-Election-2024

Group 1 peer review #5