aetilius / pHash

pHash - the open source perceptual hash library
https://www.phash.org
GNU General Public License v3.0
561 stars 81 forks source link

pHash does not work with MKV and WebM files #36

Open Mzyxptlk opened 2 years ago

Mzyxptlk commented 2 years ago

GetNumberVideoFrames() only checks for per-stream frame counts or durations, which MKV and WebM containers do not include in their headers. This causes the hashing of these files to fail almost before it even begins.

Solution: adding another fallback resolves that situation and in my (limited) tests allows pHash to work with MKV and WebM containers as well:

if (nb_frames <= 0) {
    nb_frames = static_cast<float>(pFormatCtx->duration) *
        str->avg_frame_rate.num /
        str->avg_frame_rate.den /
        AV_TIME_BASE;
}
Mzyxptlk commented 2 years ago

I ran into another issue with the above approach: for some WEBM and MKV videos, pHash returns 0 hashes.

Cause: the frame count calculated in the above snippet is only an estimate. However, in ph_getKeyFramesFromVideo(), this estimate is treated as solid fact. When that frame count is an overestimation, then the last few values in the dist array are never written to, because the corresponding frames don't exist. If one of those places happens to hold a very small or even negative number, the associated (non-existent!) frame is selected as one of the frames to hash. ReadFrames() throws no error when it is told to decode a frame past the end of the video, it simply returns 0, to indicate it decoded no frames. Thus, if a non-existent frame is selected for hashing, no frame is read, and if that's the only frame selected for hashing, then ph_getKeyFramesFromVideo() returns 0 frames, and ph_dct_videohash() calculates 0 hashes, all without any errors being reported.

If the frame count is an overestimation, the last portion of the video is simply ignored, which is only slightly less bad.

Solution: allocate space in the dist array for 20% more frames than the header estimates, and when the end of the file has been reached, reduce the actual frame count as appropriate.

--- a/src/pHash.cpp
+++ b/src/pHash.cpp
@@ -354,6 +354,7 @@ CImgList<uint8_t> *ph_getKeyFramesFromVideo(const char *filename) {
     if (N <= 0) {
         return NULL;
     }
+    N *= 1.2f;

     float frames_per_sec = 0.5 * fps(filename);
     if (frames_per_sec < 0) {
@@ -413,6 +414,8 @@ CImgList<uint8_t> *ph_getKeyFramesFromVideo(const char *filename) {
     } while ((nbread >= st_info.nb_retrieval) && (k < nbframes));
     vfinfo_close(&st_info);

+    nbframes = k;
+
     int S = 10;
     int L = 50;
     int alpha1 = 3;

An alternative approach to the first half of the patch would be to change the snippet in the OP to overestimate by 20%, instead of doing it here. This would avoid some overallocation when the frame count is not an estimate, such as for MP4 containers, but would tie GetNumberOfVideoFrames() and ph_getKeyFramesFromVideo() more closely together, which may not be what you want.