Peer Review #2 by Group 7

1. Summary

The paper by Mingrui Li uses linear regression model and key predictors to provide a prediction of share of votes likely to be gained by Donald Trump in comparison with Harris.

2. Strong positive points:

Strengths and weaknesses of the model are considered. Methodology of pollsters is discussed and idealized surveys are thoroughly designed.

3. Critical improvements needed:

Besides the paper:

Scripts lack comments for the codes.
The analysis dataset is not in parquet format.
Lack of different commits with meaningful messages.

For the paper:

Tests and simulations seem to be not yet included in any script.
Title may be not informative enough; could modify a bit to make the title tell what happens at the end of the result. The subtitles also seem fail to convey the main finding.
The abstract is missing. The link to the GitHub repository is not included.

Section 1:

The introduction structures well, but could be longer and be extended in content.
While introducing the overall paper structure, cross references seem to be not yet completed.
An unnecessary “#Data” appears at the end of the introduction part.

Section 1.1:

There’s lack of general understanding of the variables and overall features of the data. Graphs, tables and summary statistics could be added to the section.

Section 1.2:

The author could discussion more about how we went from some phenomena in the world to an entry in the dataset.

Section 2:

The reasons why include numeric grade, sample size and pollscore as the variables of the model seem to be not included in the section.
Lack of underlying assumptions discussions.
Model validation and checking procedure seem to be missing.
The linear model seems to lack complexity for a bit. If applicable, other more complicated models could be additionally used to produce the prediction and enhance the result.
Missing necessary “Results” section, which should be placed between the model section and the discussion section.
Though content for appendix is basically complete, the appendix format appears to be incorrect.

4. Suggestions for improvement:

Add meaningful comments to the scripts to explain the codes.
Switch the analysis dataset into parquet format.
More commits with meaningful messages could be done.
Tests and simulations should be included in scripts.
The title could be modified a bit to make it tell what happens at the end of the result. The subtitles also could be changed to convey the main finding.
Add abstract and Github repository link to the paper.
The introduction paragraph could be extended to 3-4 paragraphs for more content.
Cross references should be made to each section and sub-section of the paper while introducing the structure of the paper to the readers in the “Introduction” section.
Remove unnecessary “#Data” at the end of the introduction section, and add “Data” as a new section title.
Graphs, tables and summary statistics could be added to the “Overview” of data section to provide a general understanding of datasets and observations.
Discussion about how we went from some phenomena in the world to an entry in the dataset should be included in “Measurement” section.
The reasons why include numeric grade, sample size and pollscore as the variables of the model could be added to the “Model” section. If applicable, more complicated model could be considered to apply to the dataset.
Essential discussions about model assumptions and validation process should be added to the “Model” section.
“Results” section should be added to the paper, just before the “Discussion” section starts to talk about the strengths and weaknesses of the model.
Make “Appendix” section to be appropriately formatted. It should be “A” “A.1” “B” “B.1” at the front for each section title of appendix, instead of numbers.

5. Evaluation:

【R appropriately cited: 1/1】 The paper uses python and cited it in references. 【Data appropriated cited: 1/1】 The paper cited the data properly. 【Class paper: 1/1】 There’s no sign that it is a class project. 【LLM usage documented: 1/1】 LLM usage is documented in README and llm.usage.txt 【Title: 1/2】 Subtitle isn’t conveying the main finding. 【Author, data and repo：1/2】 Repository link is not included. 【Abstract: 0/4】 Missing abstract section. 【Introduction:2/4】 Introduction seems to be too short and contains too less content. 【Estimand: 1/1】 The estimand of percentage vote is clearly stated. 【Data: 4/10】 Lack of graphs, tables and summary statistics to present a general understanding of variables and observations. 【Measurement: 2/4】 Did not explain how we went from some phenomena to an entry in the dataset. 【Model: 4/10】 Lack of reasons why specific features included; Lack of discussions about underlying assumptions and validation process. 【Results: 0/10】 Result section missing. 【Discussion: 4/10】 Completed discussions about strengths and weaknesses of the model, but seem to be too short. 【Prose: 4/6】 Generally great writing skills. 【Cross-references: 0/1】 Failed to make cross-references when introducing paper structure in the “Introduction” section. 【Captions: 0/2】 Since there’s no graphs, there’s no captions. 【Graphs/tables/etc: 3/4】 Graphs, tables and plots are missing in the paper. 【Idealized methodology: 8/10】 The proposed methodology is generally well-thought through. 【Idealized survey: 3/4】 Overall construction and discussion about the survey is completed. 【Pollster methodology overview and evaluation: 10/10】 Provided an in-depth, thorough overview of the pollster’s methodology, and discussed strengths and weaknesses. 【Referencing: 4/4】 For the limited content included in the paper for now, the references are basically done. 【Commits: 0/2】 There’s lack of different commits and meaningful messages are missing. 【Sketches: 2/2】 Sketches are not included in the “other” folder. 【Simulation: 0/4】【Tests-simulation: 0/4】【Tests-actual: 0/4】 Tests and simulations seem to be not yet included in any script. 【Parquet: 0/1】 The analysis dataset is not saved as parquet file. 【Reproducibility: 2/4】 REAME is well constructed. However, the code is lack of comments. 【Miscellaneous: 0/3】

6. Estimated overall mark:

59/126

7. Any other comments:

Though the paper is currently incomplete, it has already established some basic structure. I believe that when the author thoroughly completes the paper, the project will present a great work.

aabbmddcc / US_election_prediction