nafo-forum-moderation

Various moderation tools in support of the work of NAFO Forum

Supports Bluesky NAFO Forum custom labeler on domain nafo-moderation.org
Set up using Installation instructions for Bluesky's Ozone
Suggestions welcome via any of the following:

Moderation Policy

The target community is NAFO and allies. Broadly, this covers anybody involved in the fight for democracy
against authoritarianism rising. State-sponsored disinformation is a cancer. Viral social media is one of its
primary vehicles to metastasize.
Content moderation is not limited to support of Ukraine. The scope of the service aligns with NAFO Forum
rather than with Official NAFO, which is exclusively focused on support for Ukraine.
All moderation requires human review before a label is applied to content. In the future, conservative automated
labeling may be implemented.
This service labels asymmetrically:

Offending accounts outside the community of NAFO and allies are usually labeled at the account level so abuse is highlighted universally. Experience on X shows that one-time abusers are typically repeat offenders. Post-by-post moderation does not scale.
For offenders within the community, granular content labeling is preferred. Account-level labeling is reserved for the worst offenders and requires a two-thirds super-majority of the team to approve. The moderation service must not become a disruptor.

This approach helps with managing the load as the platform grows, and to ensure users are aware of content
violations before they engage with an account. Specifically, 60/40 propagandists mix fact with disinformation
to sow disruption in democracies. A user innocently engaging with the account based on the factual content needs
to know the context.
There is no plan to act as a verifier of friendly accounts.
Send moderation Appeals and other inquiries to here. Appeals of Label actions that
are not justified in the service's immutable history will be automatically approved. Denial appeals will be supported by
provision of relevant moderation history, redacted to remove private information for the protection of moderators.
We do not use platform labels like !warn, !hide, ... Platform labels may be used for the very worst offenders like CSAM,
human or animal torture, which have been observed ad hoc by the team and will presumably be removed once platform
catches up. Target SLA for report moderation and appeals is 24 hours.
As team grows, the goal is to do better. This may be revised based on real-world constraints and experience.
Costs and funding are public domain information available on request from admin.

Reporting Guidelines

For people who are familiar with social media reporting guidelines, the rules for this service are different. The
goal is to disarm accounts posting viral disinformation and other content violations as quickly and broadly as possible.
Typically a single report is sufficient to be dispositive. No more mass reporting or arduous parsing the Terms of Service.
Be as clear as possible. Report one abusive post, or the account with a post link if you wish, and a comment.
Reports without comments make work harder and may be wrongly denied.
Repeat reports of labeled content add no value. Please ensure you are running with the labels against which you are
reporting to avoid duplicate effort. Please report illegal or especially egregious content to the platform as well as this service.

Moderator Guidelines

All moderators agree to the following:

moderators accept the risks inherent in performing this work
Foundational are the Bluesky Community Guidelines
Moderation philosophy seminal post
Personal bias, as distinct from this service's inherent editorial bias, must be left at the door. Emotive issues must be handled impartially. Be conservative. Solicit team discussion in controversial or grey areas. When team size reaches six, a two-thirds supermajority is the tiebreaker.
Must be a proactive volunteer. Solicitation of candidates is prohibited. New moderators attest that they were not solicited to participate.
Must speak English sufficiently fluent to participate in team discussions
Abuse of the moderation tools to bully users inside or outside the community or pursue personal grievances is prohibited.
Reportable behaviour that would result in a content label on any social media or other communication channel is prohibited.
Provided it is not egregiously in violation of any service label, an amnesty for prior content is granted at the time access is granted to new moderators.
Late detection of reportable behaviour can be remediated by removal insofar as it is possible, and a team-public commitment to not repeat.
Important revisions to moderation policy will be published on the service account.
Element chatroom is used for all subject-relevant moderation team discussions. A commitment to engage there is required, as team votes may be required to tie-break.
It is permitted to discuss ground rules with candidates before access is granted.
Sharing outside the moderation team of internal discussions and communications in any form is prohibited. This includes any disruptive behaviour, such as rumour-mongering and screengrabs.
once team size reaches ten, a new service admin may be nominated by a unanimous vote of the team.
No rubber stamping of reports. All reported content must be manually reviewed, and a comment affirming the reasons include on all Label actions.
respectful evangelism of the service is encouraged. Disruptive harassment of potential users is prohibited.
intentional or inadvertent public identification of any moderator other than yourself is prohibited.
this is unpaid volunteer work. Acceptance or solicitation of any consideration is prohibited.
no fixed time commitment is required. Moderators may opt out of the team at any time on request to admin. Inactive moderators may be asked whether they wish to remain involved, and removed if not.
There is zero tolerance for prohibited behaviour deemed by current service admin to be intentional.

New moderators will be provided access to the web UI on written agreement to these guidelines.
Registration at NAFO Forum to track ongoing efforts to fight disinformation is strongly suggested
but not required.

Moderator Safety

The work is satisfying but monotonous and demands constant focus and critical thinking. Prolonged exposure to toxic
internet content is well-known to damage mental health.
Self-care is more important than this work. Take breaks often and for as long as needed.
If a team member sees content they cannot or don't want to moderate for any reason, they should Escalate with a
comment for the team More volunteers can be found. Your mental health is precious, and needed so you can help.
Moderators need a moderation account on Bluesky separate from their personal account to avoid conflicts of interest
and possible harassment on the platform.
Sharing of your personal identification information is not a requirement.
Public acknowledgement by a moderator that they are active on this service is at moderator's sole option. Consider the
risks carefully before going public.
Moderation policy discussions should not be held on your public TL or any other public medium. Service admin account on
Bluesky is the sole exception.
Admin has amnesty for prior violations of this on his personal account prior to the publication of this document.
A record of active community participation and reliable reporting safeguard before access is provided helps protect against
infiltrators. Moderators assume the risk of infiltration.
Once team grows to six members, approval by a two-thirds supermajority is required to onboard a new moderator.
Moderation decisions are recorded in the system as public domain information intermingled with moderator identification.
When public domain information is published e.g. during an Appeal, all embedded private information must be redacted.
Redaction may includes visual obfuscation or paraphrasing.

Moderation Workflow

Currently simple: reports arrive in the Ozone queue and are actioned ad hoc via Label or Acknowledge
As team grows it is likely this will become:

initial review of queued report
either resolve quickly in queue, or Escalate
actioning moderator actions the report or delegates to a better choide (e.g. based on language or topic) by updating the Tag

Tag schema tbd

Future plans

Automation:

query-based for historic abusive content
real-time

Running costs are currently covered by startup admin (this poster). If costs increase significantly it may be necessary to
find outside support for running costs.
Metrics:

SQL server reports are needed: reports, labeled/not labeled, moderator activity

Installation Notes

Server is a VPS hosted by Digital Ocean, to the specs suggested, with backups at a small extra cost.
Four domains: nafo-moderation.org/com/net/info set up at Squarespace. They are just the registrar,
all DNS setup is done in the Digital Ocean web UI.
Web server Installation via the Console on the Digital Ocean "Droplet", which is what they call a VPS.
Reports arrive once it's all working properly, and can be managed using a serviceable, but not perfect, web UI.
Service account is nafo-moderation.bsky.social.
The endpoint targeted by the reporting API is ozone.moderation.org. I had to add a CNAME record to
make that work by redirecting it to nafo-moderation.org. I got confused with domain naming during
installation.
Set up nafo-moderation.org is a supported domain in Proton email to support appeals and other
stuff. Possibly for other users to help out down the line, too.

SteveTownsend / nafo-forum-moderation

readme