Closed voronoipotato closed 4 years ago
https://github.com/clearscene/pHash Here is a popular perceptual hashing library. It basically provides a video fingerprint that is close to other video fingerprints that are visually similar. I think we should be careful about considering how to implement this since of course it could be used to try and robo-scrape for copyrighted content to legally badger users independent of the actual content in the video. I do think that the value is high enough that it should be considered even if it is a risk. Perhaps if checking for every video is too expensive we use a bloom filter. Just throwing ideas out there.
Very interesting :+1:
pHash library bindings (does not support videos yet): https://github.com/aaronm67/node-phash
Judging by the activity on https://github.com/aaronm67/node-phash (no commit since october 2016), it's unlikely we can rely on it. At best we could integrate its code directly. Not sure if it's more what we search, but there are other binding libraries in node (e.g.: https://www.npmjs.com/package/video-phash-service), all unmaintained.
Even a crude solution would be much better than no solution. It does seem there are currently organizations that do rely the C++ library but likely because they have no other alternative. It's also a way we can prevent people from reuploading flagged content, like bad bad things that no moderator should be subjected to over and over.
Well I agree 100% on everything but one thing.. there should never be one judge, as in one bad thing for a moderator in Marruecos could be totally fine for a moderator in Germany... or something ok in ITaly may be very bad in Korea.. I vote to let each moderator assume the level of morality of their own community. As a good idea maybe a submission form or a list moderators see on their admin panel with flagged videos from other nodes.. and then he can choose to activate the flag or not and vice versa.
@ReK2Fernandez Proposed is merely a tool for instance owners to aid in identifying content that they don't want to seed/link to. It is not some kind of compulsory content blacklist. I think the spirit of Peertube in general is more permissive, and they may not choose to implement this simply because it may not be in line with their vision of a community without censorship. However I do think it would be a great tool for all instances. It seems like it would be useful even if this is just to prevent video reuploads which hurt the p2p nature of peertube.
pHash has not been developed since 2013 (except for a readme update). It doesn't work with the current FFMPEG. A library has been deprecated, making the source impossible to configure and install, unless you compile it with a really old FFMPEG source.
@cooperdk no need to rely on the pHash library itself, as there are alternative implementations of its algorithm (even using sharp instead of imagemagick) : https://github.com/topics/perceptual-hashing?l=javascript | and a quick look shows they are maintained, even if quite young libraries.
@rigelk AFAIK, pHash is made specifically for use with videos. This is what I am looking for. Your link shows only photo hashing code. A photo is easy to hash (there are many ways). For videos, it's different, because you have to match a video, whether or not it has been cut at the beginning (trailers) or the end, etc.
Perhaps the work to get it working with the new FFMPEG isn't horrible. It's worth investigating by someone who has fun with these kinds of things.
You're right. I just realized the newest FFMPEG has a filter named signature which will create a signature of a video (with audio) which is able to recognize videos, even if only parts of the video are the same. It can also create a binary or xml representation of a video for fast matching.
https://ffmpeg.org/ffmpeg-filters.html#signature-1
So it's just a matter of making sure that your FFMPEG is compiled with this option. I know there is a pre-compiled Windows version that is, because I have it on my computer. Otherwise, FFMPEG isn't that hard to compile.
This could be interesting, but as a meta information of the video. I think we could put something like that in a plugin.
I'm going to suspect that this is far down the road but right now someone can serve the very same video with a different compression or aspect ratio and Peertube will fail to identify the other film which is visually nearly identical. There are perceptual video hashes which might allow the user to be prompted with "This video appears to have already been uploaded, if you share the existing video it will have better availability due to our P2P network." underneath would be embedded in the page the existing video and a share button. This will prevent competing sets of torrents which will not be able to share peers despite being nearly visually identical content.
Where this legitimately happens is someone shares a phone video say on twitter, and they say signal boost this share it, get it out there make sure it's uploaded everywhere because we're afraid of some powerful individual taking it down. We're going to see 10 uploads of the same video, some cropping, some accidentally converting to .mov and then to .mkv and if we can use a perceptual video hash we can suggest to these people to instead share the video that has already been uploaded with the video embedded so they can watch it and verify it is the exact same video.