[Re] Faster Teaching via POMDP Planning (partial replication)

luksurious commented 4 years ago

Original article: Faster Teaching via POMDP Planning by Rafferty et. al (2016) , https://www.onlinelibrary.wiley.com/doi/full/10.1111/cogs.12290

PDF URL: https://github.com/luksurious/faster-teaching/blob/master/replication-paper.pdf Metadata URL: https://github.com/luksurious/faster-teaching/blob/master/metadata.yaml Code URL: https://github.com/luksurious/faster-teaching

Scientific domain: Cognitive Science Programming language: Python Suggested editor: -

(Sorry, I had first put the wrong link to the paper, it is now working well)

rougier commented 4 years ago

Thanks for your submission. We'll assign an editor soon !

rougier commented 4 years ago

@gdetor @koustuvsinha Can you edit this (regular) submission (Faster Teaching via POMDP Planning)?

rougier commented 3 years ago

@koustuvsinha Can you edit this (regular) submission (Faster Teaching via POMDP Planning)?

rougier commented 3 years ago

@koustuvsinha Gentle reminder

koustuvsinha commented 3 years ago

Hi @rougier, sorry my notifications are mixed up to a wrong email address and hence I missed this. I'll review it this week!

rougier commented 3 years ago

@koustuvsinha No problem. Actually the request is for editing, meaning you just need to assign 2 reviewers (from the board or from elsewhere). Or you can edit and review and find another reviewer.

koustuvsinha commented 3 years ago

@rougier got it! I'll ask for reviewers!

koustuvsinha commented 3 years ago

Hi @benureau 👋 would you be interested to review this article?

koustuvsinha commented 3 years ago

Hi @xuedong 👋 would you also be interested to review this article?

benureau commented 3 years ago

Hi @benureau 👋 would you be interested to review this article?

Dear @koustuvsinha, I have personally worked with Aurélien Nioche, the second author, in the same lab, during my last position. I don't think I can review this paper.

koustuvsinha commented 3 years ago

Thanks for notifying about the conflict @benureau :)

koustuvsinha commented 3 years ago

Hi @bengioe, as we discussed in email, would you be interested to review this article? You can find the reviewer guidelines here. Many thanks!

bengioe commented 3 years ago

Yes, I can review this paper.

koustuvsinha commented 3 years ago

Thanks so much @bengioe! Additionally, it would be great if you can comment in this thread with your credentials, so that we can onboard you as a reviewer. @rougier, can you confirm that is the process to onboard external reviewers?

rougier commented 3 years ago

Yes, it's not mandatory at all but it's better. I'll add @bengioe to the board.

xuedong commented 3 years ago

Hi @koustuvsinha, I think POMDP is somehow out of my expertise and may not be able to offer a proper review, I'm sorry.

bengioe commented 3 years ago

Hello, here is my review. As this is my first review here, I'm happy to extend it if more details are necessary.

Original Paper The original paper being reproduced is a 2016 paper using the Partially Observable Markov Decision Process framework in conjunction with heuristic student models to learn teaching policies. The original paper simulates students through their heuristic models, and also performs human trials to validate their approach.

Specifically, 3 student models are proposed, a memoryless model, a short-term memory model, and a continuous belief model (estimated with particle filters). In such a model, a weighing of the enumeration of all possible concepts consists of the state space, while the action space consists of the 3 possible actions for the teacher (show an example, quiz, quiz with feedback).

The original paper finds that using appropriate student models to estimate the optimal policy can improve teaching.

Reproduction This study reproduces the simulated experiments of the original paper. While most findings are not significantly different, the results that are seem more consistent with expectations about the algorithms that are used. In addition, this study highlights some errors of the original paper and explicitly provides many details useful to reproduction.

audience - I estimate that an audience with a generic CS background should be able to understand this study. The details of the POMDP framework as well as those of the individual learned models are explained clearly.
level of detail - all details needed to reproduce the simulations and reimplement the proposed algorithms are present. I appreciate in particular Table 5, which for me clarified the scope of the Number Game of the original paper.
discussion - the study contains an interesting discussion of the reproduction of the results, the differences found, as well as the choices made to create this experimental setup (such as the relative cost of actions). In particular it weakens some of the conclusions of the original paper, leaving place for improvement and additional research on this setting.
writing - The writing was clear throughout, I have made a few suggestions below.

Code comments:

I was able to run the code successfully. The code is fairly clean and self-explanatory.
I would suggest adding a mkdir -p data command in the .sh files to avoid failure from a fresh clone of the repository.
Some code seems to use random instead of numpy.random. While I see that you seed both modules, it would be preferable to only use one. (for example, prior to numpy1.17 both random and numpy.random use the same PRNG algorithm, MT19937, so seeding both with the same seed should produce the exact same sequence of numbers, which isn't desirable).

Additional comments:

Section 2.1, you introduce Q(s,a) in (2) and only define it later with (3). It may worth very shortly introducing Q right after (2), or even to more generally explain what action-value functions are.
Sec 3.2, 3.3, The noise/cost parameters (Table 1-4) appear quite specific, it may be worth mentioning the noises come from Corbett & Anderson, 1995 (I think? This is what I understood from the original paper, which you cite as their source), and that the cost come from control human experiments and represent seconds.
Sec 3.6, "Similar to the particle filter in the continuous model where, the belief [are reset]", did you mean "Similarly, for the particle filter in the continuous model, the beliefs [are reset]"
You end the paper with this: "Through this replication, we hope to facilitate research in this direction." I think it would be nice to argue why this is valuable; e.g. is it easy to write down analytical learned models? Do modern computational capacities allow to use the full potential of POMDPs?

Note on style:

in the original paper, both students and teachers are referred to as "she" and "her", whereas in this paper there is a mix of "he/his" and "their". Note that in English using "they" is a valid gender-neutral singular pronoun. If you wish to, I'd suggest either using "she/her" in honor of the original paper or "they/their" for consistency.

koustuvsinha commented 3 years ago

Many thanks, @bengioe for the comprehensive review! I'm still looking for an additional reviewer, hopefully can assign this soon. @luksurious can you go through the review and address the comments?

koustuvsinha commented 3 years ago

@amyzhang has accepted to review this paper! (as discussed in email). Thanks a lot! :) You can find the reviewer guidelines here.

luksurious commented 3 years ago

@bengioe Thank you for your review.

We have addressed your comments.

The code uses only np.random now (with the upgraded NG system in 1.17+), and the setup is self-contained (also removed some test output I left there)
The paper is updated to address your points.
- The Q function is correctly introduced, belief reset is better explained, and the conclusion is extended.
- Re Sec 3.2, 3.3, noise parameters: They were also fitted from the control experiments with humans in the original study. This is explained in detail in the supplementary material of the original paper (for reference: Supplementary material). We added a note to clarify this.
- We employed now they as a gender-neutral pronoun

Let me know if there are additional points to address.

CC @AurelienNioche

amyzhang commented 3 years ago

Replication. The original paper uses POMDPs to formulate a teacher-student setting. The selection of the next teaching activity is a planning problem, where the teacher maintains a belief state of the student. There are three models proposed, and evaluated on two tasks.
The three models consist of a memoryless model, a discrete model with memory, explicitly keeping a history of the past m actions, and a continuous model with implicit information about the entire history. The two tasks are a simple letter arithmetic task with the goal of finding the correct mapping between a set of letters and numbers, and a number game where the students learn a target number concept for general numbers, such as odd numbers or numbers within some range. Reproducibility of the replication. The authors were able to reproduce the results for the first task. However, in the second task their results differ. Specifically, the authors found that they could train policies with the three methods that perform better than random, but not necessarily better than baselines. Further, there were failure modes for certain policies paired with certain learner models.

Clarity of code and instructions. The instructions were clear, and there were separate scripts to run for each task, which made running the results very simple. I was able to set up my environment and run the scripts on the first try using the README in the code.

Clarity and completeness of the accompanying article. The article is clear and well written, with a general description of the contributions and evaluations in the original paper, and the findings in the beginning. I have a high-level suggestion to make clear what components and details in the methods section are taken from the original paper, and what was unclear from the original paper and therefore required design decisions on the part of the authors. When reading the paper, it is unclear if any liberties needed to be taken from the original method. I appreciated the analysis in the Experiments section for Task 2, where the authors lay out the differences in version of the task and their design choice. The discussion sections with analysis on the potential failure modes in the method and experiments was also very useful and answered some of my earlier questions as to what was the original method and what required changes on the authors' part in order to replicate the results. Some components should go in the methods section to highlight the contributions and possible deviations from the original method rather than getting buried at the end. Several of the conjectures for improvement in the discussion section are also very interesting, and it would be nice to see if they are backed up empirically -- but perhaps that is outside the scope of this work.

Other than the high-level reshuffling of some paragraphs in the Discussion section into the Methods section and added clarity into what was explicitly described in the original paper and what was not, I don't have any suggestions for improvements. I found the explanation of the method and tasks clear, and the results, analysis, and discussion insightful. Great job!

luksurious commented 3 years ago

@amyzhang Thank you for your review.

We verified all questions and unclear elements with the original authors. I have added a subsection at the end of Methods to make this more explicit.

Except for the remark about the belief update, which is mentioned now at the end of the methods section, I did not find any other remarks in the discussion that refer to changes compared to the original description. Please let me know if there are particular segments remaining that should be addressed earlier in the methods section.

Thanks!

koustuvsinha commented 3 years ago

Hi @luksurious, thanks for your remarks, and @amyzhang and @bengioe thanks for your valuable reviews. I believe the submission has acknowledged the proposed changes. Unless the reviewers feel strongly about the rebuttal, I vote for acceptance of the paper.

amyzhang commented 3 years ago

I also vote for acceptance.

bengioe commented 3 years ago

I vote for acceptance as well.

koustuvsinha commented 3 years ago

Great!! Congrats @luksurious, your paper is now accepted to ReScience journal! 🎉 I'll follow up soon with a PR to your repository with the correct article numbers.

koustuvsinha commented 3 years ago

@luksurious I couldn't find the ReScience template tex sources in your repository. Can you add them such that I can compile on my end?

luksurious commented 3 years ago

@koustuvsinha Ah sorry, I was working with it in Overleaf, so I synced it now to a new repository. You can find it here https://github.com/luksurious/faster-teaching-paper

koustuvsinha commented 3 years ago

Article published on Zenodo and will be available shortly on ReScience website!

Article PR: https://github.com/ReScience/articles/pull/20
Website PR: https://github.com/ReScience/rescience.github.io/pull/96

This concludes the reviewing and editing process of this paper.

ReScience / submissions

[Re] Faster Teaching via POMDP Planning (partial replication) #44