Obfuscating content IDs in tombstone requests to limit bad actors trying to collect "sensitive content" lists

pirate commented 3 months ago

So that you can't just use those IDs to find the list of banned content and be evil with it.

wilwade commented 3 months ago

Hmm... I'm not sure that alone would work to limit discovery of the delete requested content, but it would slow it down/increase the cost of it. Perhaps that is enough?

Here's how to recover the full id:

Index all the message ids from the user (these remain available even after tombstoning)
Take the shortened id, and search for (the one) matching id.
Id recovered

There was a discussion early on about using the hash of the id instead, but that ends up with the same issue. That said, this does make it harder to generally locate deleted content for an arbitrary user, so perhaps a worthwhile action?

pirate commented 3 months ago

Some salted hash of the ID could work too, the threat model I was thinking of is this:

User A publishes tiananmen_square_protest_plans.txt on the network
User B joins the network later (malicious government spy looking for dissidents and sensitive files, they have User A on a watchlist)
User A (while being closely watched by User B) tries to delete the file from the network
User B sees a deletion request for that file, and now adds it to a list of potentially sensitive files that they can use to search for more dissidents who might be hosting that file

Or even darker:

Bad User A publishes terrible_csam.mp4 to the network
Good User B, a moderator, catches it, tombstones the content, and blocklists User A + reports them to police
Bad User C has been lurking, and is on a mission to collect CSAM. they monitor the tombstone list for banned content and and now have a new hash of a file they can look for on other distributed filesystems / bittorrent / hosts that haven't deleted it / etc.

Another option is to have some kind of handshake where the user announcing the tombsone only releases the first half of the hash, User B who might have the file responds to User A with a hash of the second half. User A then checks that and if the hash matches the one they're trying to tombstone they send and Ack back and then User B deletes the file. (or something similar) It might have the downside of generating a flood of handshakes that DoS the original user though?

wilwade commented 3 months ago

I'm not sure the first threat model we can protect against if we assume a state level actor will always try to just have all the content, just as I assume they do now via scraping and agreements.

The second, it might help with as it assumes a lower level of ability to collect and process information. I think shortening does assist with decreasing this sort of optimistic discovery.

Handshakes assume interactivity on behalf of users or providers that currently DSNP doesn't have, so while possible, would require additional support structures.

Salting the hash with the id is an interesting idea. It drastically increases the search scope, so given enough volume, it would limit the misuse to scaled organizations at least. (As someone could still generate a database to build the reverse index.) It is also not introducing any new data into the mix.

It does make it harder to validate tombstone announcements (as the assumption is that the tombstone announcement doesn't disclose the id of the sender, only the provider).

Please expand on the idea however if you have additional thoughts. Perhaps there is a path there.

pirate commented 3 months ago

Yeah you basically get the idea of my concerns, and I agreee it is a hard problem to solve when state-level actors are assumed to have broad network visibility. There are edge cases that can be made more secure (e.g. when a state-level actor joins an existing network for the first time, limiting their historical access to sensitive deletions), but that's up to your team to weigh the various tradeoffs and decide on a policy.

Just keep in mind those two scenarios as you develop in the future, as they happen all the time on distributed storage networks. Because distributed storage uniquely attracts political dissidents who want anonymity and privacy, it also attracts all the governments/CSAM collectors/hackers trying to chase them and mine the networks deletion activity for sensitive material.

I just wanted to raise this discussion on Github because I spoke with some of your team members at Dweb and brought up these concerns, and they suggested I open an issue :)

wesbiggs commented 3 months ago

Happy to discuss this item on the next DSNP spec community call on September 6. Time and link here: https://vimeo.com/showcase/dsnp-public-spec-meeting

I agree this is not a problem unique to DSNP... it would be interesting to understand how other projects think about these issues, and if there could be commonality around a solution architecture.

LibertyDSNP / spec

Obfuscating content IDs in tombstone requests to limit bad actors trying to collect "sensitive content" lists #282