eigenmagic / fediblockhole

A tool for automatically syncing Mastodon admin domain blocks.
GNU Affero General Public License v3.0
70 stars 7 forks source link

How are lookalike domains handled? #56

Open ThisIsMissEm opened 1 year ago

ThisIsMissEm commented 1 year ago

For example, mastоdon.social isn't mastodon.social (the official instance), first domain is with a lookalike character for the first o in mastodon.social, so in punycode would be xn--mastdon-djg.social which is clearly different.

When Mastodon returns domain blocks from the API, they are normalised to punycode, so the API, despite accepting lookalike characters will result in them appearing as punycode in the response.

I had a look through the code, and from what I can tell there is no code for handling domain punycode normalisation, which may cause unexpected results with this tool if a source blocklist does not do punycode normalisation.

Note: As this project has neither a SECURITY.md file, nor the GitHub Security features enabled, I was not able to disclose this potential issue in a more responsible disclosure manner, without seeking out contributor email addresses (typically a privacy violation).

jpwarren commented 1 year ago

I'm not sure I understand the issue enough to know what to do about it. Could you please elaborate a little?

Is the risk that someone might think they're blocking a domain, but aren't? Or maybe block something else that looks similar but isn't the same?

And what behaviour should be expected? Should we add punycode normalisation so, no matter what gets imported, fediblockhole always operates on punycode normalised domains for its comparisons and upserts into instances?

Sorry to be dense. Just want to make sure I appreciate the issue properly.

(Reporting this publicly is fine. I'll have another look at setting up GitHub's security thing.)

ThisIsMissEm commented 1 year ago

Yeah, I think normalisation using punycode would probably be a good idea, that way you're always comparing correctly. The risk is mostly in potential mismatches between the blocklist and the instance, so yeah, someone things they're blocking a bad instance but they're actually not.

jpwarren commented 2 weeks ago

If I understand this issue correctly, the risk is:

  1. Someone puts a lookalike domain into their blocklist. Probably not an issue if it's coming via API from a Mastodon instance, because those domains are punycode normalised, but if it's a text file that could be manually done and designed to mislead.
  2. You read in the blocklist from this source and block something you didn't mean to, or believe you're blocking one thing but are actually blocking something else and thus are not blocking the thing you mean to.
  3. The risk is highest for new blocklists from sources you are only just starting to trust.
  4. There is also a risk if a blocklist you already trust is somehow compromised (unauthorised update after a breach, or an insider who decides to be evil today). It will be more difficult to detect the incorrect block because of the lack of punycode normalisation.

The remedy would be to normalise with punycode somehow. That will make it easier to detect the attempt at misleading people.

Where should this normalisation occur?

Options include:

  1. Whenever a comparison is made between domains.
  2. Whenever domains are loaded in, or saved out.
  3. Both.

I invite comment on which approach we should take, and encourage example implementations and PRs.

ThisIsMissEm commented 2 weeks ago

I'd be inclined to inspect the block list, and if any domain in it when punycode encoded doesn't match the entry's domain, then fail the import. i.e., force all domains to be punycode encoded in blocklists.