Write bullet points for EATP conference

gbencci commented 1 year ago

What do I need? The EATP conferenced asked me to be the speaker on the topic of assessment. I'll do a mix of CYF experience and then add your key elements. Could you expand on what would you like me to say on these areas below?

What did we send "Topic: Candidate: Understanding the new expectations of candidates towards assessment. This may include matters to address a shift in their value of testing, issues relating to digital poverty, equity in assessment, and access.

Tearing up the Certificates: The Measure Is Not The Target

How we work to liberate our learners from collecting endless pieces of paper and instead focus them on building their own, individual, marketable skills:

We cancelled grading... and our outcomes improved. We give our leaners the solutions and teach them how to review themselves. Our graduation criteria: You have to really build it, and you have to show us. How this fails and what to do next."

Conference is in September, so we have plenty of time. Please write as much as you can in

SallyMcGrath commented 1 year ago

Tearing up the Certificates: The Measure Is Not The Target

How we work to liberate our learners from collecting endless pieces of paper and instead focus them on building their own, individual, marketable skills:

We cancelled grading... and our outcomes improved.

https://docs.google.com/document/d/1_IlDvxC4dV2L_Ip2cjh82J-V3xQ6GGv93dSMCgosD8I/edit#heading=h.qx7eu3edqv55 https://docs.codeyourfuture.io/leaders/running-the-course/assessment/milestones

We give our leaners the solutions and teach them how to review themselves.

https://github.com/CodeYourFuture/Module-HTML-CSS/tree/solutions/Form-Controls

Our graduation criteria: You have to really build it, and you have to show us.

https://docs.google.com/document/d/1jMxqI0L7IKFENCQ8Lw-D1lhnj3c3RZI_WK808wG4YJM/edit?usp=sharing

How this fails and what to do next.

Flipped Classroom https://docs.google.com/document/d/1KtI_jRUNnvTC-ecLN1ilbkcgq3ZA_IUbbBdieKupPXw/edit?usp=sharing

gbencci commented 1 year ago

@SallyMcGrath could you share a few sentences of what we were doing before that we stopped doing when we say 'we cancelled grading'

SallyMcGrath commented 1 year ago

Oh I have examples. We made these massive spreadsheets (contain sensitive data so will send you privately) and manually gave a score between 1 and 10 every week... except:

We marked all of the first module
Started getting gappy by JS1
Some regions gave up entirely by JS2

The marking was done by many different people, so was totally inconsistent even in the same cohort. The data became more and more sketchy as time went on; and there's nothing worse than relying on unreliable information.

The final nail in the coffin for grading was when I tracked outcomes, even in regions where marking was much more consistent, there wasn't a strong correlation between grades and outcomes (eg jobs). This is similar to the interview analysis I did last year for Rainbird. I just couldn't find evidence that these processes worked. So I looked instead of things that might work - I looked into the data for signals, instead of trying to create them with processes like grading.

Understanding signals in assessment

SallyMcGrath commented 1 year ago

The thing to understand about these signals is that we have to change them all the time. Goodhart's Law teaches us that any measure that becomes a target, ceases to be a good measure. So I try to make all the measures things that are useful to do anyway - coming to class, solving problems in Codewars, committing code on Github, etc.

Of course people can game this - and do - some people will go to almost any lengths to sabotage their own lives: copy paste code into tests, use bot commits, whatever - of course some people will do this, but those people would fail anyway, because they don't yet understand that they are the ones losing out by doing this. These processes can't help those people.

There's another reason, though, that signals have to change over time, which is that they become less useful. When only a few top performers did Codewars, you could easily sort class performance by codewars score. Now everyone does CW, the score is a weaker signal. (But at least they are all now drastically better at tech tests)

Moving them from Google classroom, which has a horrible API with just a few coarse signals, to GitHub projects is part of this thinking. Now I've moved them all onto GitHub boards, I should be able to start harvesting activity data from those APIs and look for patterns we can use. But at the same time, it's much more useful for trainees to spend their time interacting with GitHub, which they will use at work, than Google Classroom. So it's always trying to find ways to make the things they have to do genuinely useful, and to design ways for them to do that so we can harvest the activity and interpret it programmatically, instead of manually/ capriciously.

SallyMcGrath commented 1 year ago

German, this is a public board, so it's better to not even share links to files possibly containing PII. I know the file is also locked, but for security, it's better to only share private files in private.

gbencci commented 1 year ago

Thanks @SallyMcGrath Sally, this is great input.

When you wrote: "This may include matters to address a shift in their value of testing, issues relating to digital poverty, equity in assessment, and access."

What do you mean by these things?

SallyMcGrath commented 1 year ago

Hmm, I don't know because I don't remember writing it. 😂 However, I can do a reading of this sentence now and apply it to our context?

Value shift

our goal is to move trainees from subordinated students, passing tests made up by us to try to find out what they can do, to active professionals - building real things as they would at work and we just observe that activity and glean insights from it - we are learning from them
we do this in many ways, and this is a work in progress, but one way is to systematically remove artificial / academic processes and make everything useful for me useful for them, too
ultimately my perspective is that our curriculum is the software and our trainees are the tests. I'm not testing them, they are testing us, so when a test "fails" I need to understand that data and address it in my code
pass and fail is the job outcome only

Equity

Case Study, WM5: the Fundamentals criteria was published for everyone beforehand and applied fairly for everyone, without fear or favour. We ignored circumstances and only considered work product. Interestingly, this class is the poorest and has the highest number of asylum seekers this region has ever selected. It is also the first class in WM to select 50:50 men and women.

We do not ignore circumstances entirely at CYF - instead we address them in practical ways. Instead of giving someone extra marks because they are hungry, we give them food.

The problem with encoding opinions/bias (even if it's meant to be positive) into assessment is that bias...exists. We should never give people the power to make predictions about or put limits on what people can do based on who they are (or thought to be). Humans are just demonstrably terrible at this. And we do not need to use bias to make predictions anyway. As Russell says, If the matter is one that can be settled by observation, make the observation yourself.

gbencci commented 1 year ago

Thank you for sharing your thoughts in detail. It helped me understand better your concerns on tracking personal circumstances and link them in any way to performance, even if it's to assist them.

In what ways are you seeing Godhart's law creeping into the program? Are any tiny or heavy nails appearing?

SallyMcGrath commented 1 year ago

Example of trainees mistaking the measure for a target

PR tracking

Because trainees know PRs are tracked automatically, they sometimes open PRs with no content, or they copy paste the answers from another PR. Of course, it's obvious how counterproductive this is: it means not only they can't understand the work, but also that they have concealed this fact so they don't get the help they need to actually understand it.

Attendance tracking

This happens mainly with online classes, but trainees signing in to Zoom and leaving it running without actually doing the class. The measure - attendance in the Zoom call - is met, but the goal is missed.

Codewars

Most Codewars solutions are available on GitHub, so you can easily "game" the system by copy-pasting them in. There's actually a timestamped challenge completion API I can access where I see trainees doing this (solving 10 kata in 60 seconds) sometimes. I don't habitually check it. But... well, it's the same issue. You pair with a trainee who has done this and ask them to explain their own work and they completely crumble. They've got no idea. Instead they've got a useless Codewars badge and no skills at all.

SallyMcGrath commented 1 year ago

Definitely in London Final Projects recently (for some reason London really struggled with this) trainees got obsessed with the PR distribution and got into massive arguments about it. A moments thought would have revealed to them that just working together would take care of that by itself and they didn't need to worry about it AT ALL unless it was actually showing a team problem (in which case focus on solving the problem, not the PRs). Take care of the team, and the PR distribution will take care of itself, naturally.

It's always possible to juke the stats. It's always possible to follow the letter without the spirit. I try to lift their eyes to the horizon as much as possible. (Often I fail!)

gbencci commented 1 year ago

Is it possible for us to lift their eyes to the horizon at all? Isn't that something that only they can do by themselves?

We can provide the right environment, the path and the community, but can we change them from the inside?

CodeYourFuture / CYF-Staff-Tickets

Write bullet points for EATP conference #152

Value shift

Equity