Open alisonrclarke opened 2 years ago
We agreed to converge teachers iteratively once per month. Crucially, we must ensure that the convergence does not lead to teachers becoming identical. That is, the convergence must stop if teachers reach a given similarity threshold.
Shouldn't the convergence occur towards the most dominant teacher (how to define dominance?)? For now, let's try iterating towards the mean value (which is apparently already implemented in its initial form) and towards the best one (see the formula below). The best value for teacher variables can be defined as the one where the biggest increase in maths score is observed.
next_value = mean + convergence_factor * (old_value - best_value[school_id])
where next_value
is one of {teacher_control
, teacher_quality
}
With many schools, we need to estimate MSE for each school separately, which I believe, as I have seen last week, needs modifying the R code included in the python code (the "_multilevelanalysis" subfolder). As MSE is calculated for the full simulated data (for all schools) currently, it appears that the easiest implementation would be passing simulated data for individual schools. Initial suggestion-idea was:
teacher_quality_best[school_id]
and teacher_control_best[school_id]
(or teacher_control_mean[school_id]
depending on what is better for the control value).The first version has been implemented, but a different approach was used for generating individual simulated data for every school (very likely better than initially suggested):
To explain briefly some technicality of the simulation implementation:
SimModel
which does the whole simulation for every class (which always belongs to a particular school if there is the school_id
column in the input pupil file) generates teacher variables Teacher Quality and Teacher Control.The observed much longer simulations appear to be due to the volume of pupil data with schools supplied now in comparison with the previous datasets which were used before. The 2nd reason for longer simulations are currently due to many debugging outputs, but I'm planning to get rid of them hoping that it'll help in acceleration.
Another aspect of the simulation is that the speed of parameterisation testing itself might be much slower in the case of many schools, because it tries to test with every school individually. My suspicion is that the overall MSE might not be improving as fast as before in this case and then more iterations are needed to achieve the same quality of results.
e.g. if 2 classes with same school, generate control/quality values for both and move towards the mean/best one throughout the year.