PotcFdk / youtube-sync

Script for maintaining an up-to-date offline mirror of a YouTube channel.
Apache License 2.0
44 stars 13 forks source link

Implement deduplication #15

Open PotcFdk opened 5 years ago

PotcFdk commented 5 years ago

While any video belongs to exactly one channel, we do support playlists and thus can have a lot of cases where a video belongs to several different profiles. However, each profile should be able to define a format (see #11), so it's not enough to just check for identical video ids, because they might impose different requirements upon the to-be-stored data.

One way to deal with this issue would be to only deduplicate if the format is identical - which might be good enough, seeing as "maximum video and audio quality" and perhaps separate audio-only profiles are the most likely use-case for this project.

Also, there are multiple ways of deduplicating.

PotcFdk commented 5 years ago

Downside: Simply backing up a profile directory might not backup 100 % of the data because duplicate videos might be stored outside of it.

This doesn't hold true for the hardlink case. Unless there's a better idea, my preference is supporting

  1. CoW-copies
  2. hardlinks
  3. symlinks

in descending order of preference, depending on which of those are supported by the filesystem and which ones we have the permissions for.