akamhy / videohash

Near Duplicate Video Detection (Perceptual Video Hashing) - Get a 64-bit comparable hash-value for any video.
https://pypi.org/project/videohash
MIT License
264 stars 41 forks source link

BUG REPORT - MAKE the -f worst optional #76

Closed akamhy closed 2 years ago

akamhy commented 2 years ago

Describe the bug The download fails on reddit.

To Reproduce less than or equal to v2.1.7

Python 3.9.0 (default, Oct 21 2021, 15:27:22) 
[GCC 10.3.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> url1 = "https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/"
>>> from videohash import VideoHash
>>> url1 = "https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/"
>>> url2 = "https://www.reddit.com/r/IndianDankMemes/comments/rmw1o9/i_am_happy_i_am_happy_i_am_happi_today/"
>>> videohash1 = VideoHash(url=url1)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/videohash.py", line 85, in __init__
    self._copy_video_to_video_dir()
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/videohash.py", line 288, in _copy_video_to_video_dir
    Download(
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/downloader.py", line 51, in __init__
    self.download_video()
  File "/home/akamhy/projects/benchmark_videohash/venv/lib/python3.9/site-packages/videohash/downloader.py", line 85, in download_video
    raise DownloadFailed(
videohash.exceptions.DownloadFailed: '/home/akamhy/projects/benchmark_videohash/venv/bin/yt-dlp' failed to download the video at 'https://www.reddit.com/r/IndianDankMemes/comments/rn2yxa/ha_bhai_normi_hu_mai/'.
[Reddit] rn2yxa: Downloading JSON metadata
[Reddit] rn2yxa: Downloading m3u8 information
[Reddit] rn2yxa: Downloading MPD manifest

ERROR: [Reddit] k4nqp99cdc781: Requested format is not available

>>> videohash1 = VideoHash(url=url1, download_worst=False)
>>> videohash2 = VideoHash(url=url2, download_worst=False)
>>> videohash1 - videohash2
4
>>> 

Expected behavior Download the video without any extra arguments.

Please complete the following information:

Additional context I don't use Reddit but a friend of mine was using videohash to search posts by templates. Both the URLs use the same template.

akamhy commented 2 years ago

Same Template == The video is the same but the text is different.

akamhy commented 2 years ago

Also maybe diversify the tests and replace some youtube videos with another source such as Vimeo, TikTok, Instagram, etc.

akamhy commented 2 years ago

The source template : https://www.reddit.com/r/IndianDankMemes/comments/rmrh63/russian_girl_buttercup_dance_template/

akamhy commented 2 years ago

The source template : https://www.reddit.com/r/IndianDankMemes/comments/rmrh63/russian_girl_buttercup_dance_template/

Well the original video is @ https://www.youtube.com/shorts/uTkzwd19mRo

akamhy commented 2 years ago

Change https://github.com/akamhy/videohash/blob/main/videohash/videohash.py#L36

from True to False and update the docstrings.