Closed gbencci closed 1 year ago
Tearing up the Certificates: The Measure Is Not The Target
How we work to liberate our learners from collecting endless pieces of paper and instead focus them on building their own, individual, marketable skills:
We cancelled grading... and our outcomes improved.
https://docs.google.com/document/d/1_IlDvxC4dV2L_Ip2cjh82J-V3xQ6GGv93dSMCgosD8I/edit#heading=h.qx7eu3edqv55 https://docs.codeyourfuture.io/leaders/running-the-course/assessment/milestones
We give our leaners the solutions and teach them how to review themselves.
https://github.com/CodeYourFuture/Module-HTML-CSS/tree/solutions/Form-Controls
Our graduation criteria: You have to really build it, and you have to show us.
https://docs.google.com/document/d/1jMxqI0L7IKFENCQ8Lw-D1lhnj3c3RZI_WK808wG4YJM/edit?usp=sharing
How this fails and what to do next.
Flipped Classroom https://docs.google.com/document/d/1KtI_jRUNnvTC-ecLN1ilbkcgq3ZA_IUbbBdieKupPXw/edit?usp=sharing
@SallyMcGrath could you share a few sentences of what we were doing before that we stopped doing when we say 'we cancelled grading'
Oh I have examples. We made these massive spreadsheets (contain sensitive data so will send you privately) and manually gave a score between 1 and 10 every week... except:
The marking was done by many different people, so was totally inconsistent even in the same cohort. The data became more and more sketchy as time went on; and there's nothing worse than relying on unreliable information.
The final nail in the coffin for grading was when I tracked outcomes, even in regions where marking was much more consistent, there wasn't a strong correlation between grades and outcomes (eg jobs). This is similar to the interview analysis I did last year for Rainbird. I just couldn't find evidence that these processes worked. So I looked instead of things that might work - I looked into the data for signals, instead of trying to create them with processes like grading.
The thing to understand about these signals is that we have to change them all the time. Goodhart's Law teaches us that any measure that becomes a target, ceases to be a good measure. So I try to make all the measures things that are useful to do anyway - coming to class, solving problems in Codewars, committing code on Github, etc.
Of course people can game this - and do - some people will go to almost any lengths to sabotage their own lives: copy paste code into tests, use bot commits, whatever - of course some people will do this, but those people would fail anyway, because they don't yet understand that they are the ones losing out by doing this. These processes can't help those people.
There's another reason, though, that signals have to change over time, which is that they become less useful. When only a few top performers did Codewars, you could easily sort class performance by codewars score. Now everyone does CW, the score is a weaker signal. (But at least they are all now drastically better at tech tests)
Moving them from Google classroom, which has a horrible API with just a few coarse signals, to GitHub projects is part of this thinking. Now I've moved them all onto GitHub boards, I should be able to start harvesting activity data from those APIs and look for patterns we can use. But at the same time, it's much more useful for trainees to spend their time interacting with GitHub, which they will use at work, than Google Classroom. So it's always trying to find ways to make the things they have to do genuinely useful, and to design ways for them to do that so we can harvest the activity and interpret it programmatically, instead of manually/ capriciously.
German, this is a public board, so it's better to not even share links to files possibly containing PII. I know the file is also locked, but for security, it's better to only share private files in private.
Thanks @SallyMcGrath Sally, this is great input.
When you wrote: "This may include matters to address a shift in their value of testing, issues relating to digital poverty, equity in assessment, and access."
What do you mean by these things?
Hmm, I don't know because I don't remember writing it. 😂 However, I can do a reading of this sentence now and apply it to our context?
We do not ignore circumstances entirely at CYF - instead we address them in practical ways. Instead of giving someone extra marks because they are hungry, we give them food.
The problem with encoding opinions/bias (even if it's meant to be positive) into assessment is that bias...exists. We should never give people the power to make predictions about or put limits on what people can do based on who they are (or thought to be). Humans are just demonstrably terrible at this. And we do not need to use bias to make predictions anyway. As Russell says, If the matter is one that can be settled by observation, make the observation yourself.
Thank you for sharing your thoughts in detail. It helped me understand better your concerns on tracking personal circumstances and link them in any way to performance, even if it's to assist them.
In what ways are you seeing Godhart's law creeping into the program? Are any tiny or heavy nails appearing?
Example of trainees mistaking the measure for a target
Because trainees know PRs are tracked automatically, they sometimes open PRs with no content, or they copy paste the answers from another PR. Of course, it's obvious how counterproductive this is: it means not only they can't understand the work, but also that they have concealed this fact so they don't get the help they need to actually understand it.
This happens mainly with online classes, but trainees signing in to Zoom and leaving it running without actually doing the class. The measure - attendance in the Zoom call - is met, but the goal is missed.
Most Codewars solutions are available on GitHub, so you can easily "game" the system by copy-pasting them in. There's actually a timestamped challenge completion API I can access where I see trainees doing this (solving 10 kata in 60 seconds) sometimes. I don't habitually check it. But... well, it's the same issue. You pair with a trainee who has done this and ask them to explain their own work and they completely crumble. They've got no idea. Instead they've got a useless Codewars badge and no skills at all.
Definitely in London Final Projects recently (for some reason London really struggled with this) trainees got obsessed with the PR distribution and got into massive arguments about it. A moments thought would have revealed to them that just working together would take care of that by itself and they didn't need to worry about it AT ALL unless it was actually showing a team problem (in which case focus on solving the problem, not the PRs). Take care of the team, and the PR distribution will take care of itself, naturally.
It's always possible to juke the stats. It's always possible to follow the letter without the spirit. I try to lift their eyes to the horizon as much as possible. (Often I fail!)
Is it possible for us to lift their eyes to the horizon at all? Isn't that something that only they can do by themselves?
We can provide the right environment, the path and the community, but can we change them from the inside?
What do I need? The EATP conferenced asked me to be the speaker on the topic of assessment. I'll do a mix of CYF experience and then add your key elements. Could you expand on what would you like me to say on these areas below?
What did we send "Topic: Candidate: Understanding the new expectations of candidates towards assessment. This may include matters to address a shift in their value of testing, issues relating to digital poverty, equity in assessment, and access.
Tearing up the Certificates: The Measure Is Not The Target
How we work to liberate our learners from collecting endless pieces of paper and instead focus them on building their own, individual, marketable skills:
We cancelled grading... and our outcomes improved. We give our leaners the solutions and teach them how to review themselves. Our graduation criteria: You have to really build it, and you have to show us. How this fails and what to do next."
Conference is in September, so we have plenty of time. Please write as much as you can in