Auto-moderator thresholds should be increased

Bossett commented 1 year ago

Describe the bug

The auto-moderator is overly aggressive in classifying 'suggestive' images. With the introduction of self-flagging, misclassifying images and taking the 'higher' rating is leading to more and more false positives, especially for users that generally try do the right thing (i.e. they will be 'punished' on visibility both when they self-flag AND when they try post an unflagged image).

To Reproduce

Steps to reproduce the behavior:

Compare:

https://bsky.app/profile/bossett.bsky.social/post/3kbjaidth322l (not flagged) https://bsky.app/profile/bossett.bsky.social/post/3kbj5f3xngj2q (wasn't flagged for >45m - manual review?) https://bsky.app/profile/frecksandframes.fans/post/3kbgr4r2iwm2q (flagged)

All these images are essentially of the same subject, with minor feature tweaks. The AI is not consistent or reasonable here. These images do not suggest anything.

Expected behavior

None of those images should be flagged.

Long term, please consider this proposal: https://github.com/bluesky-social/proposals/issues/37

A quick fix would be to increase the threshold that is used for hive matching from 0.9 to 0.95 - actual suggestive images tend to get 0.99+ ratings and so this would serve to reduce the false positive rate.

sweetbeex commented 1 year ago

Arguably more objective example - I have this post saved because it was automatically labeled by AI and the label had to be manually removed. Scores .995 suggestive because of the semblence of their tops to a bra. This is just one example of many ways in which femme presenting people especially have unduly strict limits

https://bsky.app/profile/tranniehall.bsky.social/post/3jz5u6woay42p

sweetbeex commented 1 year ago

I would say that raising the limit to .95 is still not high enough for femme presenting people

Bossett commented 5 months ago

Extending this - for 'female', just adding a little bit of bias away from labelling (i.e. lower that threshold artificially) will probably go a long way. We have a year of data showing the bias - and even if it's just vibes, there's an argument that a tweak like that will better meet user expectations.

bluesky-social / atproto

Auto-moderator thresholds should be increased #1745