Chocobozzz / PeerTube

ActivityPub-federated video streaming platform using P2P directly in your web browser
https://joinpeertube.org/
GNU Affero General Public License v3.0
12.93k stars 1.48k forks source link

Video De-Duplication Features #3141

Closed kevATin closed 3 years ago

kevATin commented 4 years ago

To my knowledge PeerTube does not yet have any features that allow for detection and potential removal of duplicate videos. I think having those would be very useful.

How to detect duplicates:

How to deal with a duplication issue:

(Should be decided by moderators on a case by case basis or also allow automated actions?)

Why is this important for PeerTube:

Any thoughts on this?

ghost commented 4 years ago

"Checksums" won't work, as peertube re-encodes everything on upload and any tiny change to the video formatting will change a normal hash. There are video fingerprinting algorithms that look at the actual content, but this can get more expensive. There are some maybe good compromises like perhaps doing a closer comparison on videos that have similar length. But ultimately I feel like this is getting into automated video moderation, which is hard to do ethically and correctly.

kevATin commented 4 years ago

@scanlime Even if the source video is the exact same, the resulting re-encodes will differ? I knew that re-encoding was sometimes a bit off but didn't think it was this imprecise.

However if I remember correctly there was an open issue regarding the storage of uploaded video source files. Maybe those could be used instead?

Do you know of any open source video fingerprinting software?

Even with no automation whatsoever as a purely manual set of moderation tools, I think de-duplication would still be useful.

ghost commented 4 years ago

It's not useful to hash videos to detect duplicates, unless you are trying to detect folks who upload the exact same source file bits. If you remux the file, if you download it with youtube-dl, certainly if you transcode it at all, the hash will change. Video software is deterministic but complicated, and everyone's configuration is going to be slightly different.

Chocobozzz commented 3 years ago

Hello,

Video de duplication is out of the scope of PeerTube. A plugin or a third party tool could help. You could use for example a perceptual comparison instead of checksums. See this blog post (in french) by @rigelk https://rigelk.eu/blog/video-similarity/

akamhy commented 2 years ago

You could use for example a perceptual comparison instead of checksums. See this blog post (in french) by @rigelk https://rigelk.eu/blog/video-similarity/

Sorry to comment on this closed issue but I wrote something[1] to solve this duplication issue and I believe it's more efficient than the solution in the link. Only one 64-bit hash is generated per input video, therefore the number of comparisons required is much lesser than the solution in the link. Also, one hash per video saves a lot of database space and the time complexity of the comparison drops to O(n).

  1. VideoHash - Python package for Perceptual Video Hashing
maxlinux2000 commented 2 years ago

Hello, I think that Peertube should have something to remove duplicate videos. Maybe not something as sophisticated as checking the hash, but something much simpler. My idea is that the user, who has one or more channels, should be able to have a button to search for duplicate videos of her and that they appear in a list. From there the user will be able to delete the duplicate videos according to his own criteria.

Suppose that in our eagerness to import videos from youtube before it disappears, we have imported the same videos several times without realizing it. A button to search for duplicate videos of my own user, based on the title and the possibility of keeping only one video, automatically deleting the others, would be very useful.

I don't think this is very heavy in terms of resource consumption and it would help us to have our videos more organized

Cheers MAX

Discostu36 commented 4 months ago

This could be even simpler. A feature that compares video metadata and shows videos where the metadata (e.g. title and length) is the same would be very helpful to tidy up duplicates from resulting from YouTube imports.

kevATin commented 4 months ago

This could be even simpler. A feature that compares video metadata and shows videos where the metadata (e.g. title and length) is the same would be very helpful to tidy up duplicates from resulting from YouTube imports.

This would only be limited in use though, since videos traveling the internet, getting reuploaded often get re-encoded and renamed over and over. Though it would still be better than to have no deduplication at all.