Open ern2150 opened 3 years ago
Both fingerprint scripts I’ve created on my own (the fingerprint hash and the one in examples) produced matches.
The trouble with them is they relied on a 1-1 Coarse Signature match. Coarse Signatures are an amalgam of various methods to identify a short 3 second clip, so that if it shows up again it’ll be signed the same way. This produced plenty of results, but not as many as I’d wanted. Large segments of reliably repeating clips (such as the first few pages of BERL credits, though the last few seconds worked) were never a 1-1 match, even when the same mixtape was played in two broadcasts.
Relying on these being a 1-1 match also introduced the idea of matching chains. If I’m rigidly matching 3-second segments, I need to be able to group together sets of those that also match to identify longer “clips.” This is possible, but by design the signatures can overlap, so any hand-written sequencing would need that extra bit of complexity to establish parallel chains, which could be confusing.
Those results clashed with results that the signature=detectmode
matching parameters for ffmpeg produced.
In addition to creating the signatures themselves, the libavfilter code is built to recognize duplicates of clips of one source in another, two at a time. This works best when the clip you’re searching for is shorter than 30 minutes, and it helps to have the video you’re searching in, the longer one, be the first argument. The results will tell you at what timecodes the videos match and for what durations. If clip B is 10 seconds long and video A is an hour long, it’s possible it will find the last three seconds of clip B in the first few minutes of video A.
Once I had generated a reliable clip I knew would repeat across broadcasts and mixtapes, I used it to search through and found some reliable results. This was with using default parameters.
The major downside to this method is that you have to have a known clip. This would be a ton of manual work to determine all repeating segments across all video files and generate clips of the appropriate length. It probably depends on the machine running the matches, but there were some odd errors at the upper bounds of search clip length, so finding a happy medium length would take a bit of trial and error.
So why did I do what I did, and why doesn’t ffmpeg do it?
I didn’t want to always have massive files laying around when I wanted to analyze matching segments, and ffmpeg appeared to crap out at comparing too much material at once.
I chose to generate the signatures as xml and then find a way to flatten them to something useful, small, and still vaguely human readable. This is what caused me to stick to exact Coarse Signature matches.
FFmpeg will generate signatures in xml or binary but only compare the videos themselves. It would make a ton of sense if they would also compare pre-made signatures. The xml signatures are still 10% of the original, but the binary versions are closer to 1%.
My compression took it down to .01%. This allowed me to do mass comparisons faster, but less comprehensive ones than the tool intended. The tool is built to collect all the matching signatures until they stop matching and then report back on start time and duration.
I found a project that takes the collected binary signatures, and sorts them into sensible lists to then send off to ffmpeg’s comparison engine.
They do this by literally copying the ffmpeg code that does the comparisons into their own code base.
In order to feed that code with inputs, they (literally?) reverse engineered the ffmpeg code that writes the binary signatures to load those same signatures (from files you’ve generated already) into memory.
It’s not an easy fit for my setup as it has some hard dependencies beyond copy-pasting ffmpeg code, is in C, and is built for Ubuntu.
So I have some choices.
I can try to adapt the Mpeg7Dupes project for the platforms I use, and see what results it generates for masses of binary signatures that I take time to generate.
I can experiment with the signature detectmode
to find a happy medium clip length, and maybe comparison width (how many large files it can compare at once). I can then use the coarse signature popularity statistics I’ve gathered so far to generate clips of that happy medium length. I can then run my own schedule of comparing those clips to multiple long videos over time.
Another wrinkle to the story is ffmpeg’s matching algorithm has several parameters to adjust how it finds matches. Thankfully the defaults have generated satisfying results, but it’s good to know those can be adjusted. If, for example, matching clips to long broadcasts gets a hit for the clip with the expected frequency (once per mixtape for any BERL), but leaves out parts of the clip that should always be there (like what happened with my strict matching), I can try to adjust to capture those as well.
After interacting with the Mpeg7Dupes author (and getting the compiled code to run on one of my linux setups) I've discovered the large files I was using to generate signatures have some anomalies that were causing problems.
The same problems don't appear to affect the smaller "copies" of those files (less bitrate and filesize, but same viewing length). I've since switched to running their comparison against these copies without failing right away.
Mpeg7Dupes has a debug compilation that allows for much more logging than anyone would ever want, so of course I'm running at that density first ;). So far, comparing two 2.5 hr files is taking longer than 12 hours and generating more than 700MB of logs. I enjoy having too much information, but will switch to the normal mode if it ends up taking much longer.
I've also asked the author some questions about how the results of the comparison would be sorted, and about how the parameters/filters work.
So far it looks like it will still be necessary to define my own "series" collection workflow. Since this works with ffmpeg under the covers, it will identify matching clips up to a certain number of frames (parameterized, defaults to 300), but stop at that. So you could end up with a series of matches that either overlap or go back-to-back, but are still described as independent matches. I called these chains earlier, so I can talk about those in detail in a separate comment, and in general in the Automated Content Matching Endeavors issue.
13 talks about video clip matching outside of any particular tech to figure it out.
Here I can talk specifically about ffmpeg, its filter for all kinds of video manipulation and introspection, and specifically the
signature
filter.