Confidence logic discussion

Started with: 5/70 8/80 if rotated

Went to on streamer request:

10/80 15/80 if rotated

I feel strongly that the "min" number is too high here

Suggested post-mortem: 3/90 - 3 matching we accept it; one off we need 10 with 9 correct 8/90 if rotated?

We should think on what our near-optimal confidence algorithm looks like. And we should have hooks to change key parameters without pushing code

@Excors has thoughts that I put below. Haven't looked at what we can take from them. If nothing else it'll give us an insight into perception and user desires, and we can take that into account in messaging

Confidence ideas

My thoughts (with hindsight) on how the confidence stuff should have worked: (probably nobody cares but I feel like I need to write this down somewhere) 1) Transcribe every image once. (By the end there were 25K images and 50K transcriptions, so that's easily achieveable.) 2) Identify clusters of similar transcriptions. E.g. pick one transcription, then find all others that match it on 5 out of 6 sides, and keep expanding the cluster until there are no similar ones left. 3) If a cluster has >=5 transcriptions, and they are all identical; or it has >=10, and >=50% are identical (or whatever numbers make sense), accept the matching ones as a confirmed result. Images from confirmed results will not be transcribed again. Mismatched ones from the same cluster are almost certainly transcription errors; don't put them back in the transcription queue unless we run out of other things to transcribe. 4) If a cluster has <5 transcriptions, or >=5 but not enough identical, put its images back into the transcription queue to get more results for them. 5) Repeat until complete. With 25K images of 5K pieces, the average piece will be in 5 images. If the transcription error rate is low, that means we should be able to confirm around 2.5K pieces after step 3. Then there should be a cluster of results for most of the 2.5K remaining pieces, and each of those needs more transcriptions - maybe 1 more (if it was already on 4 matching), maybe 4 more (if it's the only image of that piece), maybe a few more to correct for transcription errors. With the 50K total transcriptions, it should have been possible to confidently identify pretty much every single piece that was in any of the images. If I understand it correctly, the original CC design (without bugs) would have tried to transcribe the first 10K images 5x each, and the other 15K would never even be looked at, so we'd never know if they contained anything unique and useful (like one of the bounty pieces)

As to why I'm only suggesting this when it's too late to matter: I didn't really think about it before, and didn't recognise the problems with the existing design. Partly that's because I failed to focus on any one thing - I played a bit with CC, OpenCV, mapping, manually transcribing, etc, and never settled on one thing enough to contribute meaningfully. Partly it's because when I heard about CC, it already had what sounded like a reasonable confidence metric, so it felt both unnecessary and impolite to come and suggest redesigning it. (Also the code didn't seem to really encourage experimenting with confidence algorithms - it was (I think) computed incrementally as results were submitted, so it looked difficult to migrate to a different algorithm. It might have been better to keep the Django code as purely for raw data collection, with a standalone tool that would analyse the whole database at once, rather than integrating the confidence logic with the data submission where it's harder to change.) Anyway, I think the main lesson for me is that if I want to attempt to contribute to something like this, I need to focus more on one area and try to understand it fully, and then perhaps I'll be able to be a little more useful

crowd-crunch / corridorcrunch-ng

Confidence logic discussion #3