accord-net / framework

Machine learning, computer vision, statistics and general scientific computing for .NET
http://accord-framework.net
GNU Lesser General Public License v2.1
4.48k stars 1.99k forks source link

DFT Matching #1094

Open fdncred opened 6 years ago

fdncred commented 6 years ago

What would you like to submit? (put an 'x' inside the bracket that applies)

Hey @cesarsouza, I'm still working on my template matching / feature matching / PCA matching, etc and I found this code on stackoverflow. I've tried it in Octave and it seems to work well. So, I want to port it to Accord but I don't speak MatLab/Octave so I'm having a hard time understanding. Specifically, where the c variable is assigned and then the first two lines after %%find peak correlation. If you have time, I'd appreciate your advice for porting this to Accord. All the other lines seem to be pretty straight forward. Thanks, Darren

clear all; close all;

template = rgb2gray(imread('http://i.stack.imgur.com/6bTzT.jpg'));
background = rgb2gray(imread('http://i.stack.imgur.com/FXEy7.jpg'));

%% calculate padding
bx = size(background, 2); 
by = size(background, 1);
tx = size(template, 2); % used for bbox placement
ty = size(template, 1);

%// Change - Compute the cross power spectrum
Ga = fft2(background);
Gb = fft2(template, by, bx);
c = real(ifft2((Ga.*conj(Gb))./abs(Ga.*conj(Gb))));

%% find peak correlation
[max_c, imax]   = max(abs(c(:)));
[ypeak, xpeak] = find(c == max(c(:)));
figure; surf(c), shading flat; % plot correlation    

%% display best match
hFig = figure;
hAx  = axes;

%// New - no need to offset the coordinates anymore
%// xpeak and ypeak are already the top left corner of the matched window
position = [xpeak(1), ypeak(1), tx, ty];
imshow(background, 'Parent', hAx);
imrect(hAx, position);
cesarsouza commented 6 years ago

Hi Darren,

I suppose the assignment of the c variable could be done with something similar to

Complex[] aux = new Complex[Ga.Length];
for (int i = 0; i < aux.Length; i++)
    aux[i] = (Ga[i] * Gb[i].Conjugate()) / Math.Abs(Ga[i] * Gb[i].Conjugate()));
double[][] c = FourierTransform2.FFT(aux, Direction.Backward).Re();

I haven't tested it though. But I guess that what was looking unfamiliar to you was the ./ and .* operators which basically mean "apply the operator element by element between the two arrays". Basically they can be exchanged by for loops as above.

For the "find peak correlation part" you can use something like

int[] imax;
double max_c = Matrix.Max(c.Abs(), out imax); // this will return the maximum values and the place where their occur like max(abs(c(:)));

double temp = c.Max();
int[] idx = c.Find(x => x == temp); // this will find the i,j indices of the values in C that are equal to the max
ypeak = idx.GetColumn(0);
xpeak = idx.GetColumn(1);

Regards, Cesar

fdncred commented 6 years ago

Hey @cesarsouza, Here's what I have so far. This code does not compile because ComplexImage.Data returns a [,] versus a Complex[] (and other reasons). Is there an easy way to convert it?

var template = ResizeToPowerOfTwo(Image.FromFile("../../Images/ear.jpg"));
var background = ResizeToPowerOfTwo(Image.FromFile("../../Images/animal.jpg"));
var backgroundGray = Grayscale.CommonAlgorithms.BT709.Apply(background);

var bx = background.Width;
var by = background.Height;
var tx = template.Width;
var ty = template.Height;

// Make template the size of background
var newTemplate = new System.Drawing.Bitmap(bx, by);
var g = System.Drawing.Graphics.FromImage(newTemplate);
g.Clear(System.Drawing.Color.White);
g.DrawImage(template, 0.0f, 0.0f);
g.Dispose();
var templateGray = Grayscale.CommonAlgorithms.BT709.Apply(newTemplate);

var complexBackground = ComplexImage.FromBitmap(backgroundGray);
var complexTemplate = ComplexImage.FromBitmap(templateGray);

complexBackground.ForwardFourierTransform();
complexTemplate.ForwardFourierTransform();

//FourierTransform2.DFT2(complexBackground.Data, Direction.Forward);
//FourierTransform.DFT2(complexBackground.Data, Direction.Forward);

var Ga = complexBackground.Data;
var Gb = complexBackground.Data;

Complex[] aux = new Complex[Ga.Length];
for (int i = 0; i < aux.Length; i++)
    aux[i] = (Ga[i] * Gb[i].Conjugate()) / Math.Abs(Ga[i] * Gb[i].Conjugate())); //Won't compile with Gb[i].Conjugate
//double[][] c = FourierTransform2.FFT(aux, Direction.Backward).Re();
FourierTransform2.FFT(aux, Direction.Backward);
double[][] c = new double[2][];
c[0] = aux.Re();
c[1] = aux.Im();

int[] imax;
//double max_c = Matrix.Max(c.Abs(), out imax); // this will return the maximum values and the place 
where their occur like max(abs(c(:)));
double max_c = Matrix.Max(c.Abs(), 0, out imax); // No idea if this is right but won't compile the other way
double temp = c.Max();
int[] idx = c.Find(x => x == temp); // this will find the i,j indices of the values in C that are equal to the max
var ypeak = idx.GetColumn(0);
var xpeak = idx.GetColumn(1);

Thanks, Darren

cesarsouza commented 6 years ago

Hi Darren,

Hmmm yes, I didn't realize that c could have been a matrix. Maybe it should be something like this?

var template = ResizeToPowerOfTwo(Image.FromFile("../../Images/ear.jpg"));
var background = ResizeToPowerOfTwo(Image.FromFile("../../Images/animal.jpg"));
var backgroundGray = Grayscale.CommonAlgorithms.BT709.Apply(background);

var bx = background.Width;
var by = background.Height;
var tx = template.Width;
var ty = template.Height;

// Make template the size of background
var newTemplate = new System.Drawing.Bitmap(bx, by);
var g = System.Drawing.Graphics.FromImage(newTemplate);
g.Clear(System.Drawing.Color.White);
g.DrawImage(template, 0.0f, 0.0f);
g.Dispose();
var templateGray = Grayscale.CommonAlgorithms.BT709.Apply(newTemplate);

var complexBackground = ComplexImage.FromBitmap(backgroundGray);
var complexTemplate = ComplexImage.FromBitmap(templateGray);

complexBackground.ForwardFourierTransform();
complexTemplate.ForwardFourierTransform();

var Ga = complexBackground.Data;
var Gb = complexBackground.Data;

var c = Jagged.CreateAs(Ga);
for (int i = 0; i < c.Rows(); i++)
    for (int j = 0; j < c.Columns(); j++)
        c[i][j] = (Ga[i, j] * Complex.Conjugate(Gb[i, j])) / (Ga[i, j] * Complex.Conjugate(Gb[i, j])).Magnitude;

FourierTransform2.FFT2(c, FourierTransform.Direction.Backward);
double[][] realC = c.Re();

Tuple<int, int> imax;
double[][] abs_max_c = realC.Abs();
double max_c = Matrix.Max(abs_max_c, out imax);

double temp = realC.Max();
int[][] idx = realC.ToMatrix().Find(x => x == temp); 
var ypeak = idx.GetColumn(0);
var xpeak = idx.GetColumn(1);

Regards, Cesar

fdncred commented 6 years ago

Thanks @cesarsouza. This is definitely progress as it all compiles now and I can step through it. However, it gives a different result than Octave, i.e. it doesn't work. LOL. :) It returns xpeak/ypeak as 0/0 which is wrong.

A few differences that I've noticed is that Octave doesn't require the images to be a power of 2. I thought I read somewhere that Accord didn't require a power of 2 image for FFT.

The differences I see from the Octave output to C# are:

  1. The images are different sizes, since I'm resizing both of them to a power of 2.
  2. Ga, Gb, c appear to be in single dimension array whereas ComplexImage.Data is 2. To be honest, I'm not sure what a double dimension array would look like in Octave so I'm just guessing at this point. Update: They are all 2 dimensional arrays in Octave too. I didn't see the Dimension column in Octave. Doh!
  3. Template and Background images are uint8, which I assume is equivalent to bytes, but they also appear to be a one dimension array. Update: Wrong again, they are 2 dimensional.
  4. And of course, imax (281711), max_c (0.15992), xpeak (318), ypeak (215) are different (Octave values in parens)
  5. I also changed the BT709 algorithm to Y so that it more closely matches Octave.

I don't expect you to keep working on this but if you have any further pointers it would be helpful.

Attached are the images from the original stackoverflow article, for anyone that wants to help work on this code.

Animal animal Ear ear

This is my naive ResizeToPowerOfTwo method which I'm sure could be condensed to a one-liner, but for completeness here it is.

public System.Drawing.Bitmap ResizeToPowerOfTwo(System.Drawing.Bitmap inputBmp)
{
    int w = inputBmp.Width;
    int h = inputBmp.Height;

    int newW = w, newH = h;
    bool pof2 = Accord.Math.Tools.IsPowerOf2(newW);
    while (!pof2)
    {
        newW++;
        pof2 = Accord.Math.Tools.IsPowerOf2(newW);
    }
    pof2 = Accord.Math.Tools.IsPowerOf2(newH);
    while(!pof2)
    {
        newH++;
        pof2 = Accord.Math.Tools.IsPowerOf2(newH);
    }

    if (w == newW && h == newH)
        return inputBmp;
    else
    {
        var biFilter = new ResizeBilinear(newW, newH);
        return biFilter.Apply(inputBmp);
    }
}
cesarsouza commented 6 years ago

Hi Darren,

Indeed, it shouldn't have been necessary to resize the images to be power of 2 if you are using the FourierTransform2 class. I still haven't been able to test the code though, but where exactly the dissimilarities start to arrive?

I think that a better approach to port this code into C# would be to do it in parts. For example, you can save the values of the different matrices in Octave at different steps of the algorithm. And then, you can load them into C# using MatReader. This way you will not have to worry about the grayscale algorithm being the same (instead of loading the image and converting to gray, just load the values from Octave directly).

Regards, Cesar

cesarsouza commented 6 years ago

Oh actually I've just realized that the Octave code is using c(:) instead of c in the last operations. If I recall correctly from Matlab, this is a flatten operation meaning the 2d matrix is being transformed into a 1d one. So actually imax shouldn't be a tuple, it should be a single index.

fdncred commented 6 years ago

I was using

complexBackground.ForwardFourierTransform();
complexTemplate.ForwardFourierTransform();

which apparently uses the original FourierTransform versus FourierTransform2. FT2 needs a [][] versus a [,] so I'll need to convert it somehow. Fun, fun, fun.

fdncred commented 6 years ago

I just remembered why I have to have a power of 2 image. These methods require it.

var complexBackground = Accord.Imaging.ComplexImage.FromBitmap(backgroundGray);
var complexTemplate = Accord.Imaging.ComplexImage.FromBitmap(templateGray);
cesarsouza commented 6 years ago

You can convert from [][] to [,] with .ToMatrix() or [,] to [][] with .ToJagged()

cesarsouza commented 6 years ago

Hmmm those methods come from the original AForge project. Instead you can load the image as normal then convert it to a double[][] array using .ToMatrix() or .ToJagged() methods too (or the ImageToMatrix class). Then from there you can convert them into Complex[][] using .ToComplex() (again if I recall correctly).

fdncred commented 6 years ago

ok, I'll try that.

fdncred commented 6 years ago

double[][] doesn't contain a method for ToComplex. :(

ImageToMatrix conv = new ImageToMatrix(0, 1);
double[][] bm;
conv.Convert(backgroundGray, out bm);
double[][] bt;
conv.Convert(templateGray, out bt);
var complexBackground = bm.ToComplex();
var complexTemplate = bt.ToComplex();
cesarsouza commented 6 years ago

As a temporary solution you can use:

int rows = real.Rows();
int cols = real.Columns();
var arr = Matrix.Zeros<Complex>(rows, cols);
for (int i = 0; i < rows; i++)
    for (int j = 0; j < cols; j++)
        arr[i, j] = real[i, j];

But I will be adding those in a few minutes, thanks for letting me know they were missing!

fdncred commented 6 years ago

FourierTransform2 wants a Complex[][] not a Complex[,]. Is there a Matrix.Zeros() that returns [][]? I can just try that ToJagged() or ToMatrix() if needbe.

fdncred commented 6 years ago

It works! Octave results: imax (281711), max_c (0.15992), xpeak (318), ypeak (215) C# results: imax (214,317), max_c (0.15567475624645191), xpeak (317), ypeak(214) I'm assuming the results are slightly different because it appears that Octave converts the image to NTSC color before converting it to Gray (for whatever reason)

cesarsouza commented 6 years ago

Yes, Jagged.Zeros()

fdncred commented 6 years ago

@cesarsouza, For completeness, here's my function. Feel free to tweak it and make a demo out of it.

#region SO Artical Source Code
//https://stackoverflow.com/questions/32664481/matlab-template-matching-using-fft
//clear all; close all;

//template = rgb2gray(imread('http://i.stack.imgur.com/6bTzT.jpg'));
//background = rgb2gray(imread('http://i.stack.imgur.com/FXEy7.jpg'));

//%% calculate padding
//bx = size(background, 2); 
//by = size(background, 1);
//tx = size(template, 2); % used for bbox placement
//ty = size(template, 1);

//%// Change - Compute the cross power spectrum
//Ga = fft2(background);
//Gb = fft2(template, by, bx);
//c = real(ifft2((Ga.*conj(Gb))./abs(Ga.*conj(Gb))));

//%% find peak correlation
//[max_c, imax]   = max(abs(c(:)));
//[ypeak, xpeak] = find(c == max(c(:)));
//figure; surf(c), shading flat; % plot correlation    

//%% display best match
//hFig = figure;
//hAx  = axes;

//%// New - no need to offset the coordinates anymore
//%// xpeak and ypeak are already the top left corner of the matched window
//position = [xpeak(1), ypeak(1), tx, ty];
//imshow(background, 'Parent', hAx);
//imrect(hAx, position);  
#endregion

var template = System.Drawing.Image.FromFile("../../Images/ear.jpg") as Bitmap;
var background = System.Drawing.Image.FromFile("../../Images/animal.jpg") as Bitmap;
// Use Y because Octave uses YIQ
var backgroundGray = Grayscale.CommonAlgorithms.Y.Apply(background); 

//%% calculate padding
//bx = size(background, 2); 
//by = size(background, 1);
//tx = size(template, 2); % used for bbox placement
//ty = size(template, 1);

var bx = background.Width;
var by = background.Height;
var tx = template.Width;
var ty = template.Height;

// Make template the size of background
var newTemplate = new System.Drawing.Bitmap(bx, by);
using (var g = System.Drawing.Graphics.FromImage(newTemplate))
{
    g.Clear(System.Drawing.Color.White);
    g.DrawImage(template, 0.0f, 0.0f);
}
var templateGray = Grayscale.CommonAlgorithms.Y.Apply(newTemplate);

// Change images into double[][] arrays
var bk = backgroundGray.ToJagged(0);
var tm = templateGray.ToJagged(0);

// Change double[][] arrays into Complex array
int rows = bk.Rows();
int cols = bk.Columns();
var Ga = Jagged.Zeros<Complex>(rows, cols);
for (int i = 0; i < rows; i++)
    for (int j = 0; j < cols; j++)
        Ga[i][j] = bk[i][j];
FourierTransform2.FFT2(Ga, Direction.Forward);

// Change double[][] arrays into Complex array
rows = tm.Rows();
cols = tm.Columns();
var Gb = Jagged.Zeros<Complex>(rows, cols);
for (int i = 0; i < rows; i++)
    for (int j = 0; j < cols; j++)
        Gb[i][j] = tm[i][j];
FourierTransform2.FFT2(Gb, Direction.Forward);

//%// Change - Compute the cross power spectrum
//Ga = fft2(background);
//Gb = fft2(template, by, bx);
//c = real(ifft2((Ga.*conj(Gb))./abs(Ga.*conj(Gb))));

var c = Jagged.CreateAs(Ga);
for (int i = 0; i < c.Rows(); i++)
    for (int j = 0; j < c.Columns(); j++)
        c[i][j] = (Ga[i][j] * Complex.Conjugate(Gb[i][j])) / (Ga[i][j] * Complex.Conjugate(Gb[i][j])).Magnitude;

FourierTransform2.FFT2(c, Direction.Backward);
double[][] realC = c.Re();

//%% find peak correlation
//[max_c, imax]   = max(abs(c(:)));
//[ypeak, xpeak] = find(c == max(c(:)));

Tuple<int, int> imax;
double[][] abs_max_c = realC.Abs();
double max_c = abs_max_c.Max(out imax);

// Don't really need this since the output of imax is sufficient
double temp = realC.Max();
int[][] idx = realC.ToMatrix().Find(x => x == temp);
var ypeak = idx.GetColumn(0);
var xpeak = idx.GetColumn(1);

// Save output to bmp with highlighted section found
var outBmp = background.Clone(System.Drawing.Imaging.PixelFormat.Format24bppRgb);
using (var gfx = Graphics.FromImage(outBmp))
{
    gfx.DrawRectangle(Pens.Red, xpeak[0], ypeak[0], template.Width, template.Height);
}
outBmp.Save("background-found.png", System.Drawing.Imaging.ImageFormat.Png);
cesarsouza commented 6 years ago

Thanks a lot @fdncred!

So this method can be used to perform template matching through the FFT right? If yes then it's really interesting, it should be the method described by Lewis in this paper which was used in the production of Forest Gump in 1995.

Really cool. I had in mind to implement this method since 2012 but never got the time to do it. Thanks a lot!

Regards, Cesar

fdncred commented 6 years ago

Yes, matching with FFT. Your welcome. You did most of the work! hehe.

fdncred commented 6 years ago

I read that Lewis paper. I'm not sure I'm doing the same thing. For one, I didn't create an Integral image, nor do I have an image summed or summed squared template. Perhaps that's what's going on under the hood. You'd know better than I. That algorithm sounds real interesting indeed. The one deficit that I see with the algorithm posted here is it's slow. Perhaps some parallel code could fix that.

cesarsouza commented 6 years ago

Ah yeah - we are not using IntegralImages yet. However, please note that the use of Integral Images is just a way to accelerate this method. I guess that it would help accelerate the look for a max in the last part of the algorithm (the code after the %% find peak correlation comment).

I don't know how interested you are into getting this method to work faster, but the framework supports integral images through the IntegralImage class or IntegralImage2. It might require some time to understand how to apply them in this context though.

fdncred commented 6 years ago

I'm definitely interested in speed but the slow part of this code is the FFT2, everything else is probably fast enough. So if an integral image would speed up FFT2 it seems that the FFT2 code would have to change versus the analysis afterward, right? Maybe it's just my ignorance of how all this works and how converting to integral would help.

aniloymagil commented 6 years ago

Hi, this is not the method (normalized cross-correlation) described in Lewis '95 paper. In the code, you are performing a division (Ga.*conj(Gb))./abs(Ga.*conj(Gb)) in transform domain. As the output of this element-wise division, what you actually have is phase-only information of the correlation plane. Other similar template matching approaches that are using correlation measures or filters can be performed in transform domain too.

fdncred commented 6 years ago

Hi @aniloymagil, Can you suggest changes so that it matches Lewis' '95 paper? Thanks, Darren

aniloymagil commented 6 years ago

Hi @fdncred, this link provides a MATLAB code of NCC method in the paper. You can follow this implementation to make changes in your code.

Best, Anil

HouseOfFinwe commented 6 years ago

I’m trying to learn about template matching since I plan on doing some image stitching for a project. What is the benefit of using this DFT normalized cross correlation matching vs the existing cross correlation matching or the exhaustive template matching in this library? Which works best? Which is fastest?

fdncred commented 6 years ago

@HouseOfFinwe, I can't answer all your questions but if you're looking for speed you should look at OpenCV. I've used OpenCvSharp and EmguCV, both are great assemblies.

if you're doing image stitching I'd suggest researching Kaze (again in OpenCV). There are examples of stitching with SIFT and SURF but I like Kaze better and they're pretty much interchangeable when it comes to stitching.

HouseOfFinwe commented 6 years ago

Thanks for the response. I’m looking for a purely .net solution if possible, which excludes EmguCV (and obviously OpenCV). Im interested in understanding the tradeoffs between the different template matching strategies.