ashleyrtate / DxScanScenes

DirectShow.NET scene-detection algorithm written in C#.
GNU Lesser General Public License v3.0
0 stars 0 forks source link

Not working #1

Open VamuveTV opened 7 years ago

VamuveTV commented 7 years ago

Hi Ashley. Great work, but...it is not working when the differences in avgDiffChange and (-prevAvgDiffChange * .5) are both negatives.

On this: if (prevAvgDiffChange > prevDiffThreshold && avgDiffChange < (-prevAvgDiffChange * .5))

For example i have a file that from which the previous and next frames have this values: avgDiffChange = -8 (-prevAvgDiffChange * .5) = -45

I can send to you the file but, you can reproduce this negative values, like this: Create a mpg file containing basically a sequence of 2 colored frames only: Frame 1: colors = RGB (97, 39, 219) (blue) Frame 2: colors = RGB (255, 0, 254) (pink)

Frame 3: colors = RGB (97, 39, 219) (blue) Frame 4: colors = RGB (255, 0, 254) (pink)

Frame 5: colors = RGB (97, 39, 219) (blue) Frame 6: colors = RGB (255, 0, 254) (pink) .... All of them should be detected as scene changes, since the differences varies a lot, but....both differences are in negative form.

The mpg containing those chains of frames gave me this result: avgDiffChange = -40.6 (-prevAvgDiffChange * .5) = -59

If you need, i can send to you the files to test

ashleyrtate commented 7 years ago

Interesting. I haven't touched this code in 4 or 5 years other than dumping it to Github, but I will take a look when I have a moment. I should have some test code in the original project it was used for where I can try to replicate this.

VamuveTV commented 7 years ago

many thanks. If needed i can send to you the files. Basically it seems to occur only when those negative values are found. A pseudo code to fix can be done like this

TmpprevAvgDiffChange = (-prevAvgDiffChange * .5))

If avgDiffChange < 0 && TmpprevAvgDiffChange < 0 avgDiffChange = abs (avgDiffChange) TmpprevAvgDiffChange = abs(TmpprevAvgDiffChange) End_If and then using something similar to your code..

     if (prevAvgDiffChange > prevDiffThreshold && avgDiffChange < (TmpprevAvgDiffChange)
     {
        return true;
     }

     return false;

I converted the code to assembly 32 bits in order to test the files, because i don´t know how to program in .Net, but on the files i´m testing i´m having the same results while debugging in VistualStudio and in the assembler i´m using for the tests.

VamuveTV commented 7 years ago

Hi Ashley. One question about the values you used to configure. How did you computed them ? I mean, why using this ? // RGB difference threshold which indicates a scene change when at baseline RGB level private const double BaselineRgbDiffThreshold = 21; private const double BaselineRgbLevel = 90; // min and max thresholds based on the ranges found in real data private const double MinRgbDiffThreshold = 5; private const double MaxRgbDiffThreshold = 45; // this much above or below threshold means we're in the uncertain range private const double BaselineUncertainty = 5; // amount to change diff threshold in relation to changes in RGB level (21/90, 21.395/91, etc.) private const double ThresholdToLevelRatio = .395;

I would like to understand how did you found those values, in order to try to make a automatic way to find the threshold and the ones above

Btw...the code works quite perfect. A few things need to be modified in order to fix those scenes that are not being identified. I´m amazed to see it working without the needs to convert to CieLab or other ColorSpace. It seems to use only a statistical analysis of the raw RGB values themselves.

Maybe it also works collecting the data only for a specific channel I mean, the 2000 data can be collected from Red or Green or Blue Channel separately. So, we may need to choose only one channel for make the analyses. Such as the red channel, for example. (Or the green channel since human eye is more sensitive to green). Using one single channel may work i suppose and will speed it up a lot.

ashleyrtate commented 7 years ago

Scan through the comments on the blog post linked in the readme. It explains the history of those values.

It has worked pretty well for a lot of folks, but it's not always perfect. One guy in Denmark was using it to subtitle American moves for TV stations or a cable network. He sent me a one minute clip from the movie 16 blocks that had something like 45 cuts...it missed a few of those!

VamuveTV commented 7 years ago

Are the comments from here ? http://coditate.blogspot.com.br/2008/05/video-scene-detection-with.html if it is, i presume you found those by test and error on other videos, until you find the values. But, reading the values, i guess there is some math logic evolving them, that can be computed. From what i understood the code is basically retrieving statistical data from the RGB. So, the values seems to be using the minimum Standard Deviation from 0 to 255.

The minimum Standard Deviation for a range of data from 0 to 255 is something around 53.6. So, [MinRgbDiffThreshold: 5.0] [MaxRgbDiffThreshold: 45.0]

Seems to be MinSTd = (MinRgbDiffThreshold + MaxRgbDiffThreshold)

Also... [BaselineRgbDiffThreshold: 21.0] [BaselineRgbLevel: 90.0]

Seems to be computed as: BaselineRgbLevel = MaxRgbDiffThreshold*2 = 90 BaselineRgbDiffThreshold = ((MaxRgbDiffThreshold-MinRgbDiffThreshold)/2) + 1

And... [ThresholdToLevelRatio: 0.395] Is aproximated of 101/255 which seems to be to be close to MinSTd *2 / 255

Apparently, the threshold is related to the Minimum Value of Standard Deviation.

So, how can we define, MaxRgbDiffThreshold and MinRgbDiffThreshold from MinSTd ?

It seems that MinRgbDiffThreshold = MinSTd /10 =~ 5.3 MaxRgbDiffThreshold = MinSTd - MinRgbDiffThreshold =~ 47.7

Which turn onto: BaselineRgbLevel = MaxRgbDiffThreshold*2 = 95.4 BaselineRgbDiffThreshold = ((MaxRgbDiffThreshold-MinRgbDiffThreshold)/2) + 1 = 22.2

Ok....remember i thought ThresholdToLevelRatio could be computed as MinStd*2/255 ? Well...it is quite close, but it is not ok, yet.

Since, 0.395 = 100.725 / 255

We can find those values summing this: BaselineRgbLevel + MinRgbDiffThreshold = 95.4 + 5.3 = 100.7 Which is closer then the other formula.

So, maybe ThresholdToLevelRatio can be computed as: ThresholdToLevelRatio = (BaselineRgbLevel + MinRgbDiffThreshold)/255

Note: Of course, this considering fixed values usefull for all videos computing the STD from a table of 0 to 255....But.....for accuracy, perhaps, the better is compute the STD of the whole video, summing all RGB values from each frame to we calculate the Standard Deviation and minimum and maximum STD of the whole video. Upon this we maybe get a accurate way to find the threshold values for all videos automatically ?

VamuveTV commented 7 years ago

Hi Ashley. I gave more tests, and it seems to not work as expected on consecutive scene changes. I used your code, but it only works on hardcuts and it seems that the frames being found are always 2 or 3 ahead (Soft cuts and hard cuts). On consecutive scene changes, the algo fails. I uploaded a test file here: https://www.4shared.com/video/46Y8x0teca/Johnny2.html Definitelly, the algo works biased on Standard Deviations. For hard cuts, the best way seems to be only computing the Minimum Standard Deviation (for population and not sample STD) taking onto account the xor value on the way you did. So, you sum up the values found on the xor operation and compute the Standard Deviation of them. When the resultant Minimum Standard Deviation results in positive values (Bigger then zero) it means we found a Hard Cut (Abrupt scene change). This can reach around 100% of hard cuts. The small problem is that flashes on the scene can be mistaken with hardcuts as well (rare, but it happens)....So, this is where your algo can be used to fix....Your algo, seems to detect the soft cuts and hard cuts on a similar way computing the prevDiffThreshold. The prevDiffThreshold variable is definitelly calculated accordying to the Mean and Minimum Standard Deviation. I made a formula for this biased on your algo. It seems to be something like: prevDiffThreshold = M0 - 1 + MinSTd(1/2 - x) - (2/255)(MinSTd^2)(1 - x)(2 - x). Where: M0 = Previous Mean, MinSTD (Minimum Standard Deviation of the whole video), x = the 0.1 factor you used for the MinRgbDiffThreshold value (That seems to be as i explained eariler). But.....despite that, the algo still fails on consecutive frames. I´m still trying to figure it out why. Can you please test it on the video i uploaded ? Regards, guga

ashleyrtate commented 7 years ago

As I explained in the blog post referenced, this was developed as self-tuning scene detection algorithm for sports videos. I'm not sure that you are using it for it's intended purpose. It definitely is not intended for consecutive cuts on successive frames.

There is also a configurable min delay on cuts to prevent 2 or 3 cuts in a row on a quick camera pan that looks like a cut. You should be sure that's set to zero to get the closest possible cuts.

I may look at this if I have time but it's just demo code that I put out years ago. It is not a priority for me to support.

VamuveTV commented 7 years ago

Hi Ashley. Thanks for reply. I´m not using for sports video, but for general video. I´m making a plugin for VirtualDub. I settled to zero as you said on the code, but still no clue why it is not working on consecutive frames or some transition frames. About the delay...you mean the value on ThresholdToLevelRatio variable or the BaselineRgbDiffThreshold ? I´m asking because the formula "prevDiffThreshold = (pPrevAvgRgbLevel-90.0)*0.395 + 21.0" is mainly performing checks on the Minimum Standard Deviation values found in between frames. I reformulated it to work exactly on the same way as yours, but using only one variable, i call "x" whose value is only 0.1 (equivalent to yours 5.0 used in MinRgbDiffThreshold).

And, if i may add...it´s not just demo code. What you did on your algo is really incredible. The level of accuracy exceeds other algorithms and yet is somewhat simple to create and simple to understand why it works, because it is perfoming simple statistical analysis from 2 frames . I´m using your code for general video scene detection and it really works, but it have some those minor flaws that i´m struggling to understand why. If i could make it detect consecutive frames, then, the problem should be fixed for the general cases.

Basically the function is:

[MinSTDGRad: R$ 53.5997293645009733] ; this value is computed from CalcSTDForGradient.
; It is StandardDeviationGrad.PopulationStd.Min Not using the function any longer,
; since i need only the resultant value.

Proc GetDefaultThreshold:
    Arguments @pPrevMeanLuma, @XRatio, @PprevDiffThreshold
    Uses edi, eax

    ; get the threshold

    ; MinRgbDiffThreshold = MinSTd /10 =~ 5.3 (Original value)
    ; MaxRgbDiffThreshold = MinSTd - MinRgbDiffThreshold =~ 47.7 (Original value)

    mov edi D@XRatio
    fld R$MinSTDGRad | fmul R$edi | fstp R$MinRgbDiffThreshold
    fld R$MinSTDGRad | fsub R$MinRgbDiffThreshold | fstp R$MaxRgbDiffThreshold

    ; Which turn onto:
    ; BaselineRgbLevel = MaxRgbDiffThreshold*2 = 95.4
    ; BaselineRgbDiffThreshold = ((MaxRgbDiffThreshold-MinRgbDiffThreshold)/2) + 1 = 22.2

    fld R$MaxRgbDiffThreshold | fmul R$Float_Two | fstp R$BaselineRgbLevel

    fld1 | fld R$MaxRgbDiffThreshold | fsub R$MinRgbDiffThreshold | fmul R$Float_half | faddp ST1 ST0
    fstp R$BaselineRgbDiffThreshold

    ; ThresholdToLevelRatio = (BaselineRgbLevel + MinRgbDiffThreshold)/255
    fld R$BaselineRgbLevel | fadd R$MinRgbDiffThreshold | fmul R$FloatOne255 | fstp R$ThresholdToLevelRatio

    mov eax D@pPrevMeanLuma
    mov edi D@PprevDiffThreshold
    fld R$eax | fsub R$BaselineRgbLevel | fmul R$ThresholdToLevelRatio
    fadd R$BaselineRgbDiffThreshold | fstp R$edi

    Fpu_If R$edi > R$MaxRgbDiffThreshold
        fld R$MaxRgbDiffThreshold | fstp R$edi
    Fpu_Else_If R$edi < R$MinRgbDiffThreshold
        fld R$MinRgbDiffThreshold | fstp R$edi
    Fpu_End_If

EndP

called from:

    ; find soft cuts
Proc WasPrevNewScene:
    Arguments @mfd, @pPrevMinLuma, @pDiffMeanXorLuma, @pPrevDiffMeanXorLuma
    Uses edi

    call GetDefaultThreshold D@pPrevMinLuma, Float_OneTen, prevDiffThreshold

    mov edi D@pPrevDiffMeanXorLuma
    fld R$edi | fmul R$Float_Minus_Half | fstp R$TmpprevAvgDiffChange
    ..Fpu_If R$edi => R$prevDiffThreshold
        mov edi D@pDiffMeanXorLuma
        .Fpu_If_And R$edi < R$FloatZero, R$TmpprevAvgDiffChange < R$FloatZero, R$edi < R$TmpprevAvgDiffChange
            Fpu_If R$edi => R$Float_Zero
                mov eax eax
            Fpu_Else_IF R$TmpprevAvgDiffChange => R$Float_Zero
                mov eax eax
            Fpu_End_If
            mov edi D@mfd
            mov D$edi+SceneTypeDis SCN_SOFT_CUT
            mov eax &TRUE
        .Fpu_Else
            xor eax eax
        .Fpu_End_If
    ..Fpu_Else
        xor eax eax
    ..Fpu_End_If

EndP

I´m testing both versions, your´s original with fixed values for MaxRgbDiffThreshold, BaselineRgbLevel etc and mine that calculate those values according to the Standard Deviation value from the whole video. Or...in the last case above, computed from a data chain from 0 to 255 that represents a gradient shade of grey (or luma) that is where your algo seems to work with. I mean, soft cuts seems to be found on the difference of Minimum Standard Deviation values found in between frames. In case, 53.5997293645009733 is the value of the Minimum Standard deviation of the gray chain (0 to 255) (So: a data sequence from 0,1,2,3...255). And from there i can calculate the threshold. . This way is faster then i have to calculate the Minimum Standard deviation of the whole movie, and still got the same results as yours in the fixed version.

No matter using fixed (yours) variables or calculated (mine) the main problem is on consecutive frames or some frames that are being missed by the algo. For Hard cuts the detection seems ok, but i found easier and faster simply using the Minimum Standard deviation from the resultant xor operation from one frame nd the one that preceeded it (as you did). The main problem relies basically on soft cuts.