Perceptual video hashing so that individuals can decide to avoid splitting peer networks

Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser

https://joinpeertube.org/

GNU Affero General Public License v3.0

13.09k stars 1.51k forks source link

Perceptual video hashing so that individuals can decide to avoid splitting peer networks #214

Closed voronoipotato closed 4 years ago

voronoipotato commented 6 years ago

I'm going to suspect that this is far down the road but right now someone can serve the very same video with a different compression or aspect ratio and Peertube will fail to identify the other film which is visually nearly identical. There are perceptual video hashes which might allow the user to be prompted with "This video appears to have already been uploaded, if you share the existing video it will have better availability due to our P2P network." underneath would be embedded in the page the existing video and a share button. This will prevent competing sets of torrents which will not be able to share peers despite being nearly visually identical content.

Where this legitimately happens is someone shares a phone video say on twitter, and they say signal boost this share it, get it out there make sure it's uploaded everywhere because we're afraid of some powerful individual taking it down. We're going to see 10 uploads of the same video, some cropping, some accidentally converting to .mov and then to .mkv and if we can use a perceptual video hash we can suggest to these people to instead share the video that has already been uploaded with the video embedded so they can watch it and verify it is the exact same video.

voronoipotato commented 6 years ago

https://github.com/clearscene/pHash Here is a popular perceptual hashing library. It basically provides a video fingerprint that is close to other video fingerprints that are visually similar. I think we should be careful about considering how to implement this since of course it could be used to try and robo-scrape for copyrighted content to legally badger users independent of the actual content in the video. I do think that the value is high enough that it should be considered even if it is a risk. Perhaps if checking for every video is too expensive we use a bloom filter. Just throwing ideas out there.

Chocobozzz commented 6 years ago

Very interesting :+1:

pHash library bindings (does not support videos yet): https://github.com/aaronm67/node-phash

rigelk commented 6 years ago

Judging by the activity on https://github.com/aaronm67/node-phash (no commit since october 2016), it's unlikely we can rely on it. At best we could integrate its code directly. Not sure if it's more what we search, but there are other binding libraries in node (e.g.: https://www.npmjs.com/package/video-phash-service), all unmaintained.

voronoipotato commented 6 years ago

Even a crude solution would be much better than no solution. It does seem there are currently organizations that do rely the C++ library but likely because they have no other alternative. It's also a way we can prevent people from reuploading flagged content, like bad bad things that no moderator should be subjected to over and over.

r3k2 commented 6 years ago

Well I agree 100% on everything but one thing.. there should never be one judge, as in one bad thing for a moderator in Marruecos could be totally fine for a moderator in Germany... or something ok in ITaly may be very bad in Korea.. I vote to let each moderator assume the level of morality of their own community. As a good idea maybe a submission form or a list moderators see on their admin panel with flagged videos from other nodes.. and then he can choose to activate the flag or not and vice versa.

voronoipotato commented 6 years ago

@ReK2Fernandez Proposed is merely a tool for instance owners to aid in identifying content that they don't want to seed/link to. It is not some kind of compulsory content blacklist. I think the spirit of Peertube in general is more permissive, and they may not choose to implement this simply because it may not be in line with their vision of a community without censorship. However I do think it would be a great tool for all instances. It seems like it would be useful even if this is just to prevent video reuploads which hurt the p2p nature of peertube.

cooperdk commented 6 years ago

pHash has not been developed since 2013 (except for a readme update). It doesn't work with the current FFMPEG. A library has been deprecated, making the source impossible to configure and install, unless you compile it with a really old FFMPEG source.

rigelk commented 6 years ago

@cooperdk no need to rely on the pHash library itself, as there are alternative implementations of its algorithm (even using sharp instead of imagemagick) : https://github.com/topics/perceptual-hashing?l=javascript | and a quick look shows they are maintained, even if quite young libraries.

cooperdk commented 6 years ago

@rigelk AFAIK, pHash is made specifically for use with videos. This is what I am looking for. Your link shows only photo hashing code. A photo is easy to hash (there are many ways). For videos, it's different, because you have to match a video, whether or not it has been cut at the beginning (trailers) or the end, etc.

voronoipotato commented 6 years ago

Perhaps the work to get it working with the new FFMPEG isn't horrible. It's worth investigating by someone who has fun with these kinds of things.

cooperdk commented 6 years ago

You're right. I just realized the newest FFMPEG has a filter named signature which will create a signature of a video (with audio) which is able to recognize videos, even if only parts of the video are the same. It can also create a binary or xml representation of a video for fast matching.

https://ffmpeg.org/ffmpeg-filters.html#signature-1

So it's just a matter of making sure that your FFMPEG is compiled with this option. I know there is a pre-compiled Windows version that is, because I have it on my computer. Otherwise, FFMPEG isn't that hard to compile.

Chocobozzz commented 4 years ago

This could be interesting, but as a meta information of the video. I think we could put something like that in a plugin.