eigenmagic / fediblockhole

A tool for automatically syncing Mastodon admin domain blocks.
GNU Affero General Public License v3.0
70 stars 7 forks source link

delete_block is not referenced anywhere - and there seems to be no way for obsolete blocks to be removed #39

Open cunningpike opened 1 year ago

cunningpike commented 1 year ago

There is a delete_block function in init.py, but it has no references, and I notice that if you change the source list you are using to a shorter/less strict one containing fewer blocks, the old blocks don't get removed.

jpwarren commented 1 year ago

delete_block() is a carryover from the script that predated fediblockhole. I kept it because it might have been useful, but there isn't really a good way to auto-remove a block completely with the way domain_blocks in Mastodon currently work.

When you say "obsolete block", what do you mean? How do you know it's obsolete? And how does fediblockhole, a simple piece of software running on angry sand, know that deleting the block is the right thing to do? What if it's wrong?

I generally want to have a record of why a block was removed, and the current way to do that is effectively to mark a block as noop severity, so you keep the comments but can unblock the instance. A full delete will erase all record of why a domain is or isn't blocked.

It's a challenge to define what "no record exists" means. Does it mean you've never seen this domain before and no block has been needed? Or was it blocked until last week but now all records of what happened are gone?

For the scenario you describe, why should moving to a shorter external list affect blocks in your instance that are not referred to in that external list? Why would "these blocks are not present in anyone else's list" necessarily mean deleting those blocks? There are a bunch of reasons why you might want your instance to be informed by other people's blocklists but not necessarily copy them verbatim.

It's tricky.

It might be worth adding a utility script to remove blocks en masse, but I'd want to keep it separate from the main functionality of fediblockhole unless/until there was more energy+thought put into how deleting blocks, as distinct from marking them as noop, will work in practice, particularly at any kind of scale.

I want fediblockhole to be careful about what it does because it's an automated system. If it goes wrong, it'll go wrong a lot and maybe break a bunch of stuff. There is a lot of potential for instance admins to blow off their own foot by accident. They already have enough to do without having to go fix a big problem because they made a simple error that anyone could have made. Several of which I have already made myself in developing fediblockhole.

Given all of the above, what do you think would be a good change to make to the code? What outcome are you looking to achieve?

cunningpike commented 1 year ago

All great questions - the use case I had was that an instance (mstdn.ca) appeared on a very large blocklist that I unwittingly used the first time I ran fediblockhole. I switched to a shorter list and was thinking (without mentally going through the all the excellent points you made above) that, if a domain is no longer in the list(s) you are pushing to an instance, it would be removed.

I solve that particular issue by running another .py script I found that cleared all my blocks and allowed me to start over with a better list. Understand that I was not operational yet, so didn't have to think of any of the issues you raised above, which are all valid.

Perhaps we could write an enhancement that could take a minimum severity level value for "obsolete" blocks (i.e. ones that are no longer in the feed being pushed), allowing individual admins to decide what to do. We could even default that to "keep" so that it would have no effect unless an admin changed it?

There's also the issue of processing cost - we would have to iterate through all the existing blocks from the instance, and see if they were still in the new list...that could potentially double the run time for the job...

sgrigson commented 1 year ago

It wouldn't be a bad idea to have a config option that lets you clear all blocks that are not included in a given sync.

I've written up an idea around this called a "Retractions File" which would also work, but there's no state checking of what was synced in a prior import.

But with a configuration option that allows for wiping of remote blocks that aren't found in the list, we could easily sync up a remote environment to exactly match the blocks pushed over to it in an update.

cunningpike commented 1 year ago

I am wondering if the mstdn.party and mstdn.plus problems are a good use case for this feature - people will want to filter them now, but potentially remove the blocks later if an admin regains control over them...?

I guess I'm thinking of something like the MTA reputation style lists, when there is a way to get off them eventually...

More than willing to contribute code to support this...

jazmichaelking commented 1 year ago

The shared blocklists tend to be driven by group consensus from various trust and safety groups, which routinely unblock or unsilence service providers that have responded to prior blocks and silences by adding more moderation resources, making policy changes, publicly committing to change, all sorts of reasons. Being able to use a shared blocklist or an exemplar server as a sentinel requires that the blocks removed upstream flow down to the endpoints consuming those lists.

cunningpike commented 1 year ago

Agree - I am working on a contribution that implements this to be a future PR.

sgrigson commented 1 year ago

You may already be considering this, but I'd recommend simply making it a config option in the .conf.toml file.

Something like:

instance_blocks_exact_match=true

Then this would instruct the fediblockhole process to--when reading blocks from the server--not just apply new ones, but delete any that don't match as well. Obviously defaulted to false.

This could then solve a similar issue with domain blocks where perhaps a subdomain block changes to a full TLD block.

cunningpike commented 1 year ago

Yes, exactly - something like sync=true but the same idea... the local CSV file capability gives operators using that option to maintain their own list even when domains disappear from the pulled lists.

jpwarren commented 1 year ago

The design of FediBlockHole treats the actual blocks in the instance as authoritative, rather than a local CSV file. The behaviour you're describing is what happens now: if a domain disappears from a pulled list but exists in your instance, the assumption is that your instance admins/mods knows what they want and the instance block should remain in place.

You use mergeplan to decide if blocklists you pull in raise or lower the severity of blocks that already exist in your instance. In min mode, you lower your block severities if the pulled blocklists have lower severity than your instance. The default max mode assumes that blocks mostly exist to increase severity, which is what I've observed so far in practice.

You may be the first to decide to block an instance, and this sort of sync function would automatically undo your own moderation decisions in the instance. You would effectively remove all local moderation ability and cede moderation to third-party blocklists and automation. That is a significant decision and shouldn't be taken lightly.

Unless you also remember to manually add blocks you do in the Mastodon interface to a local override CSV file. Which seems like unnecessary double-handling to me, and a bunch of tedious admin that people just won't do. And if you forget, it'll be a weird puzzle to figure out why the blocks your mods are adding keep disappearing.

jpwarren commented 1 year ago

Please re-read my first comment where I talk about about noop severity and assigning meaning to the absence of a block.

Part of the challenge here seems to be due to the way Mastodon's UI encourages admins to delete a block rather than moving a block to noop level. Perhaps something to take up with the Mastodon devs. Or perhaps the maintainers of the blocklists you're using.

I am reluctant to add automation to what seems like a suboptimal UI decision so that people can shoot themselves in the foot faster and with greater accuracy. That doesn't feel like progress to me.

sgrigson commented 1 year ago

That's fair.

The only other possible option I was thinking of was perhaps the private comment.

If adding a block updated the private comment to refer to some kind of internal key, then when reading blocks from a remote server, Fediblockhole would immediately be aware of blocks it added so long as it can read the private comment.

It could then use that private comment key as a way of identifying a block that was added by the blockfile that is no longer in the blocklist, and safe for removal.

If the owner of the site changed that private comment, they would break that association and allow the block to remain and not be removed.

jpwarren commented 3 weeks ago

Coming back to this and the use of private comments in this way is interesting.

That might be worth exploring, depending on what the Mastodon devs have planned for blocklists in the new v4.3 train of code. Now that v4.3 has landed, let's see if it's worth the dev effort?