Online Prediction Accuracy

josephtey commented 7 years ago

Regarding the paper titled “Back to the basics: Bayesian extensions of IRT outperform neural networks for proficiency estimation”, I am interested in the online prediction accuracy metric of evaluation.

Couple questions (in relation to the 1PL IRT model):

In this metric, students are split into training and testing populations. In a real life scenario, the initial training population used to determine item-level parameters would not always be available, especially in a flashcard application, where predictions are required immediately without any prior item-level parameter estimation.

In such a situation, is an IRT model unsuitable? Must the IRT model have initial data to work with, before making predictions; or can the model be continuously trained from the start? If so, what would be the default parameters to start with?

When you say the students are split into training and testing populations, what is the ratio between the populations? 70/30? 60/40?

Thanks so much for your time, looking forward to your response.

khwilson commented 7 years ago

To your first question, having no data about a situation is always tricky. How you solve this depends greatly on your application and how you perceive the costs of being wrong. Concepts you might find interesting to study related to this problem are the exploration-exploitation tradeoff and zero shot learning. Also, there is specific literature around IRT and flashcard applications, as well as large data sets available from flashcard applications to try out any ideas you might have retrospectively.

To your second question, the parameters for splitting schemes used in the paper appear in the README of the repo. We used a five-fold scheme, and reported the average of our metrics.

josephtey commented 7 years ago

Thanks for the clarification; just a few follow up questions:

1) What are your thoughts on training a model with pre-existing data? For instance, DKT/logistic models do not require any student/item specific parameters, hence, the model could be trained with data collected elsewhere, and then implemented in an application to make immediate, accurate predictions.

For a model that does not require item or student parameters, would this be appropriate? What are the benefits of using data from the same students/items to train general weights? A model trained on dataset A could then be tested on datasets B and C, just like a real flashcard scenario; what are your thoughts on this?

2) When trying to evaluate an IRT model through online prediction accuracy, after determining all item parameters, is the ability parameter updated through retraining the model with ALL the data collected thus far (all the students + training data), or just the students’ INDIVIDUAL data? In other words, what data is used to train the student-level parameters?

Thanks again, looking forward to your response.

Knewton / edm2016

Online Prediction Accuracy #3