Moderation - Githubissues

I wanted to express my view of how moderation should work, based on how we have it in Drupal at DGU, although I've not done much moderation myself. (I'm aware that we've not established this as a minimum necessary thing, but want to chart a direction if it does go into scope.)

We have:

table 'spam_reports' to store 'user X said issue/comment Y is spam'.
issue/comment.visible=True/False
issue/comment.is_spam which is tri-state - unmoderated (null?), moderator says it is spam (True), moderator says it is not spam (False).

So when a new issue or comment comes in, our 'spam logic' decide it is:

visible=True
is_spam=null

And in the future we could add more tests, like for strong language, or bayesian (e.g. Mollom) which could add more info like issue.mollom_verdict=True/False/not_sure and our 'spam logic' might make a different decision and make visible=False, so a moderator would have to make it visible again if it was genuine.

At this point a moderator could come along and browse latest issues/comments and point out any that are spam or not, setting the is_spam field and therefore .visible reflects that.

Another user could say it looks like spam, and we add a line to spam_reports. If a second user does this and the moderator has not said anything, then our 'spam_logic' would decide to change visible=False.

Again the moderator comes along to the "moderate issues/comments" page. This would be a faceted view or maybe a few tabs. Perhaps the highest priority for him to look at are those in the 'grey area' - had one spam report, or mollum said its not sure, or there are other indicators we can add later, like strong language. Then next you might check for false positives, since there might only be a few of those, e.g. a flurry incorrect spam_reports - people ganging up on someone. And then you might want to scan through the dozens of comments in case you can spot any spam. Ideally by the end he'd mark all as is_spam=True/False, so they wouldn't be shown when he next does a batch of moderation. However there is a chance that he'll want to rereview stuff in the past, so I'd think it is useful to have them available if necessary. Hence me suggesting facets / tabs.

The 'spam_logic' should be replaceable, to be more or less defensive and rely on more clues that become available, e.g. if bayesian is available, or take a dim view of particular email providerts, or location of the IP address etc.

ckan / ckanext-issues

Moderation #27