If the approach is to be federated moderation services, there's an opportunity to enforce some rules that will enhance the user experience when posting content that will be moderated by automated tools.
New required endpoints
Labelling services should be required to implement a moderation check endpoint, which should return a label, or a non-label indicating that content will be human-reviewed (or cannot be reviewed in real time). This endpoint should take a complete post object, but not a post that has already been committed to a repo.
The purpose of this endpoint is to allow an end-user, through the/an app, to 'pre-check' a post before it gets posted.
New UI functions
On posting, similar to the language selection, users should be prompted to assign a label they think is appropriate. This should follow the model set by the language dialog, but maybe display in a menu for content types, etc..
Submission logic
During the Post process, the app should check the endpoint, and if the user-desired label is of a lower 'level' than the value returned by the endpoint, the user should be warned and allowed to 'upgrade' the rating. The user can choose to 'post anyway'.
The app & services should treat the user provided label as correct. Human moderators (or through reporting feedback, etc.) can now focus on downgraded posts to identify problem users, and where moderators agree with a user rating, that can be used to train moderation models.
New Policy
The AI/etc. does not need to be 100% in this model. It just needs to be good enough that it doesn't overwhelm moderators with bad decisions.
Implementation will require specific new policy that describes the different ratings. Because of the user-provided labels, there should be an easy basis for appeals, etc. if the rules for each rating are objective enough.
Future application & interaction with multiple labelling services
When federating labelling services, there will be conflict between results from different labelling tools. When a user subscribes to a specific labelling service, it should be possible to place it in a ranked list (possibly per category but this may be a UI nightmare). The highest rank wins and gets to label the post. "User Provided" should be in this ranked list and default to the top.
Labelling services should be allowed to return "I won't label this" for some material (hence ranks - so a service can be specifically for hate speech without requiring it to know about impersonation, and it can 'fall through').
If the approach is to be federated moderation services, there's an opportunity to enforce some rules that will enhance the user experience when posting content that will be moderated by automated tools.
New required endpoints
Labelling services should be required to implement a moderation check endpoint, which should return a label, or a non-label indicating that content will be human-reviewed (or cannot be reviewed in real time). This endpoint should take a complete post object, but not a post that has already been committed to a repo.
The purpose of this endpoint is to allow an end-user, through the/an app, to 'pre-check' a post before it gets posted.
New UI functions
On posting, similar to the language selection, users should be prompted to assign a label they think is appropriate. This should follow the model set by the language dialog, but maybe display in a menu for content types, etc..
Submission logic
During the Post process, the app should check the endpoint, and if the user-desired label is of a lower 'level' than the value returned by the endpoint, the user should be warned and allowed to 'upgrade' the rating. The user can choose to 'post anyway'.
The app & services should treat the user provided label as correct. Human moderators (or through reporting feedback, etc.) can now focus on downgraded posts to identify problem users, and where moderators agree with a user rating, that can be used to train moderation models.
New Policy
The AI/etc. does not need to be 100% in this model. It just needs to be good enough that it doesn't overwhelm moderators with bad decisions.
Implementation will require specific new policy that describes the different ratings. Because of the user-provided labels, there should be an easy basis for appeals, etc. if the rules for each rating are objective enough.
Future application & interaction with multiple labelling services
When federating labelling services, there will be conflict between results from different labelling tools. When a user subscribes to a specific labelling service, it should be possible to place it in a ranked list (possibly per category but this may be a UI nightmare). The highest rank wins and gets to label the post. "User Provided" should be in this ranked list and default to the top.
Labelling services should be allowed to return "I won't label this" for some material (hence ranks - so a service can be specifically for hate speech without requiring it to know about impersonation, and it can 'fall through').