friendica / friendica

Friendica Communications Platform
https://friendi.ca
GNU Affero General Public License v3.0
1.41k stars 332 forks source link

[Feature Request] add a standardized hash field to the `photos` table. #8972

Open WNYmathGuy opened 4 years ago

WNYmathGuy commented 4 years ago

Expected behavior

Have a way for admins to be notified if their server contains copyrighted photos or material evidence of a crime.

Additional background

Sam Harris dropped a podcast very recently and I'm a bit freaked out by it. #213 - THE WORST EPIDEMIC During the interview, they discussed the ability to match a hash function on an image to a table of hashes identifying child abuse and sexual exploitation imagery. It wouldn't be too hard to have the same hash function as a field in the photos table and maybe a copy of the table of known hashes of problematic imagery. That way an admin could check by the way of their Friendica Interface to see if a join on the two tables had any results identifying a given user. The admin could take whatever actions they deemed appropriate at that time. In the interview, they indicate that 3%-5% of people on the earth (uniformly distributed) are involved in the production or consumption of child abuse imagery. That means I might have 13,000 images on either of my two instances and have no idea they are there or any way of properly removing them.

Actual behavior

I don't see any way to protect and manage what goes on my hard drives.

Steps to reproduce the problem

  1. Try to spot check photos randomly within the photos table.
  2. Discover there are over a quarter-million images in the table.
  3. Try sorting the table by size largest to smallest.
  4. Get error messages from phpMMyAdmin and lose the ability to browse the table because it's taking a wicked long time to sort.

Friendica version you encountered the problem

2020.07 with database version is 1355, the post update version is 1350.

Friendica source (git, zip)

git

PHP version

7.0.33

SQL version

MariaDB 10.5.4

MrPetovan commented 4 years ago

This is a good idea, however for us to be able to implement this in Friendica, we would need an existing database of known illegal content, and a way to query it. We are already doing it for exposed passwords, but even after a cursory search I'm not aware of a similar free or open service to detect child pornography or copyrighted content.

annando commented 4 years ago

Question is if there is any database like this. Obviously we cannot provide such a database by ourselves ...

MrPetovan commented 4 years ago

I kept looking and I found one in the UK but you have to be a paid member of this non-profit organization. Nothing freely accessible.

WNYmathGuy commented 4 years ago

About halfway through the podcast I linked in my I.P. the interviewee discusses a Microsoft database of that type.

MrPetovan commented 4 years ago

Microsoft has a system called PhotoDNA which fingerprints pictures and videos and allows to find matches between them. It just is the query part of the solution I outlined above, we still need a readily available database to query. Not the images/videos themselves, of course, but the hashes produced by PhotoDNA. This is what the UK non-profit is providing.

WNYmathGuy commented 4 years ago

"Microsoft has a system called PhotoDNA" I must have wrongfully thought that was openly searchable.

MrPetovan commented 4 years ago

Well, I tried accessing the PhotoDNA website using my Microsoft/Skype credentials, but the website errors out with a nasty 500 error. It seems PhotoDNA is meant to combat child pornography, and it's free for qualifying customers, which I have not been able to figure if we can offer it as an addon because the website crashes. Even on Microsoft Edge.

MrPetovan commented 4 years ago

There is an available FAQ for the service that doesn't help figuring out if node admins could qualify as free users: https://www.microsoft.com/en-us/PhotoDNA/FAQ

WNYmathGuy commented 4 years ago

I gave a whirl at contacting PhotoDNA too. I received a reply from them yesterday and I'm replying to them to see what they say. They may show up here to discuss it more for all I know.