Rule of Thumb

If I were the author, would I find the review helpful

Focus on main things.
Every highlighted issue should have a solution recommendation.

My template

- Summary (2 sentences) 
- relevance to the conference

# Major 
- Novelty and impact (1 sentence)
- Methods
   - strength: 1) 2) 3)
   - weakness: 1) 2) 3)
- Results 
   - strength: 1) 2) 3)
   - weakness: 1) 2) 3)

# Minor
- figure/table/formatting 
- language

From sharky6000

Start reading the papers as soon as you get them. NIPS papers tend to be quite content-rich, so allocate the time you are given accordingly. Do not-- I repeat-- DO NOT leave them all to the last week.
Focus on the main points that contribute to your overall score/decision. Make it clear to the authors what these are versus the minor fixable things. Sometimes authors get fixated on the minor problems in their rebuttals and proper communication will prevent this.
Be polite (but still honest of course). Papers often have months or even years of work behind them and it is demoralizing when feedback is needlessly harsh.
If its a reject, tell the authors what they can do to either convince you or improve it for the next conference. FOCUS ON IMPROVEMENT SUGGESTIONS
If you don't understand something, ask for a clarification in the rebuttal. If possible, try to give authors the benefit of the doubt before you make up your mind.
For the love of everything that is good in this world, read the rebuttals! Participate in the discussion. Update your reviews post discussion. This varies from conference to conference, but in my experience, probably 35% of reviewers do not react at all after submitting the review and I wonder how they keep getting asked to review.

From Mateusz Buda

Brief summary
1. A brief summary of the article and its contribution.
2. State what you think the contributions are.
  Structure
Is the article well-written and easy to understand?
Could it be made clearer?
Is the article well-organized?
Does the article contain all of the components you would expect (Introduction, - -
Background, Related Work, Methods, Results, Conclusions, Discussion)?
Are the sections well-developed?
General comments regarding layout and format, title, abstract, figures and tables.
Novelty
What’s new about the work?
Does the author do a good job of synthesizing the literature?
Is there some related work that the authors have missed?
Does the related work invalidate the contribution, or (more likely) simply change its context or emphasis?
Criticism of methodology
Is the methodology clearly explained?
Are there gaps or unaddressed issues?
Are there any apparent technical flaws?
Criticism of results
Does the theory connect to the data?
Are you convinced by the author’s results? Why or why not?
Does the author answer the questions he/she sets out to answer?
Strong points
Was there anything you thought was really cool about the paper?
Conclusion
Give a brief recommendation for the paper and your reasons for it.
Examples from ACL
example 1

This paper presents a language model framework for code and comment inconsistency evaluation and correction. It utilized cutting-edge NLP technologies and provided working implementations for code improvements, which is a good fit for the EMNLP system demonstration track.

Major

The software is user-friendly and does what it says. It could be easily integrated into a part of regular software development to help make code and comments more consistent.
DocChecker provides both a model and a framework, which allows plugging in other large language models. Such a framework can be reused and extended.

I am still concerned about whether the model resolution is sufficient to explain code-comment mismatch generated during different iterations. For instance, when I tested it on functions related to JSON file validation, the results largely depended on whether the word “validation” was there or not. But when the basic structure of the code doesn’t change much between versions, can the model handle such differences?

About the methods:

The training dataset seems a bit small. Does it really cover the situations you aim to address?
Does which programming language matter in the code alignment. If yes, is 50(300/6) clean examples per language sufficient? About the results section: The performance in the cleaned test set and full test set are similar. any explanations?

Minor

I tested its performance in python. If comments are marked with # and span multiple lines, the model only recognizes the first line as a valid comment. Seems to be a small bug.

XuperX / TheCollector

How to be a good reviewer for conferences #27

Rule of Thumb

My template

From sharky6000

From Mateusz Buda

Brief summary

Structure

Novelty

Criticism of methodology

Criticism of results

Strong points

Conclusion

Examples from ACL

Major

Minor