Make teachers within the same school more similar

alisonrclarke commented 2 years ago

e.g. if 2 classes with same school, generate control/quality values for both and move towards the mean/best one throughout the year.

MarkLTurner commented 2 years ago

We agreed to converge teachers iteratively once per month. Crucially, we must ensure that the convergence does not lead to teachers becoming identical. That is, the convergence must stop if teachers reach a given similarity threshold.

parnumeric commented 2 years ago

Shouldn't the convergence occur towards the most dominant teacher (how to define dominance?)? For now, let's try iterating towards the mean value (which is apparently already implemented in its initial form) and towards the best one (see the formula below). The best value for teacher variables can be defined as the one where the biggest increase in maths score is observed.

next_value = mean + convergence_factor * (old_value - best_value[school_id])

where next_value is one of {teacher_control, teacher_quality}

parnumeric commented 2 years ago

With many schools, we need to estimate MSE for each school separately, which I believe, as I have seen last week, needs modifying the R code included in the python code (the "_multilevelanalysis" subfolder). As MSE is calculated for the full simulated data (for all schools) currently, it appears that the easiest implementation would be passing simulated data for individual schools. Initial suggestion-idea was:

Generate individual simulated data for every school, which will allow not to modify the R code significantly, only adding MSE calculations for individual school files. The full MSE calculation remains the same for the whole simulation (global advancement of new parameter sets narrowing down the search ranges from iteration to iteration); but the partial MSE calculations for every school will yield teacher_quality_best[school_id] and teacher_control_best[school_id] (or teacher_control_mean[school_id] depending on what is better for the control value).

parnumeric commented 1 year ago

The first version has been implemented, but a different approach was used for generating individual simulated data for every school (very likely better than initially suggested):

The same global file of real and simulated pupil data is still used in the R code, but the R code has been extended in the way that the dataframes for individual schools are extracted for calculating partial MSEs (the same algorithm!). The good thing is that the global file is read only once and the data are handled only in memory, so no performance loss due to I/O operations a supercomputer.
Now, some input/output files contain resulting information not only for all schools at once, but for individual schools as well, for the purpose of reading/writing all data at once in order not to lose performance due to I/O operations on a supercomputer.

To explain briefly some technicality of the simulation implementation:

The SimModel which does the whole simulation for every class (which always belongs to a particular school if there is the school_id column in the input pupil file) generates teacher variables Teacher Quality and Teacher Control.
Before introducing schools, the teacher variables were global. In the new implementation, their mean and sd values are tested the same for every individual school as it was before for all schools. However, the model remembers now the best (best mean or simply best? need to check!) values for the teacher variables tested in previous iterations which then are used as the values, towards which convergence is made (the longer in time into the simulation, the stronger convergence is). Convergence is updated (increased) every 30 days by default with a constant rate different for Teacher Quality and Teacher Control.

The observed much longer simulations appear to be due to the volume of pupil data with schools supplied now in comparison with the previous datasets which were used before. The 2nd reason for longer simulations are currently due to many debugging outputs, but I'm planning to get rid of them hoping that it'll help in acceleration.

Another aspect of the simulation is that the speed of parameterisation testing itself might be much slower in the case of many schools, because it tries to test with every school individually. My suspicion is that the overall MSE might not be improving as fast as before in this case and then more iterations are needed to achieve the same quality of results.

DurhamARC / classroom-abm

Make teachers within the same school more similar #146