Open adam2392 opened 3 years ago
cc: @parthgvora @ChesterHuynh
cc: @jovo @rflperry in case you guys had strong opinions.
To me, the patch
is the set of sampled features in the structured space, not applicable to non-structured SPORF. The projection matrix
is the n_projections x n_features
matrix for the candidate feature calculation, whose rows are the flattened patches.
As long as a summation is taken, this strategy should work as you can weight the projection_matrix
(1's for MORF, +-1 for SPORF, kernel weights for Gabor, potentially any user-specified filter).
The benefit in my mind for explicit construction of the projection matrix
is for computational reasons as I believe it to be the fastest method. Once it's made, you can just do matrix multiplication instead of looping over samples and applying transforms to each. In high dimensional settings, I can imagine the projection matrix being quite sparse and thus maybe it isn't as efficient though, but I'd imagine there to be an efficient solution to that.
The issue that arises in Gabor is that the way it's applied to a 2D array is through convolution generally and the parameters of the Gabor Kernel changes the size of the kernel itself.
E.g.
from skimage.filters import gabor_kernel
frequency = 1.0
kernel = gabor_kernel(frequency=frequency, n_stds=1, bandwidth=1)
print(kernel.shape)
>>> (3, 3)
frequency = 0.5
kernel = gabor_kernel(frequency=frequency, n_stds=1, bandwidth=1)
print(kernel.shape)
>>> (5,5)
frequency = 0.2
kernel = gabor_kernel(frequency=frequency, n_stds=3, bandwidth=1)
print(kernel.shape)
>>> (19, 19)
If we wanted to set the selected patch with the weights according to a Gabor kernel, I'm realizing that that's not well defined.
Rather instead, one needs to do i) convolution and then ii) summarize up the result into one number similar to how SPORF works. Using np.convolve(patch_of_data, kernel, mode='same')
, will output now a 2D array with the same size as the patch_of_data
that's the kernel convolved with this patch of data (maybe the patch in this case is the entire image). Then we can take the summation of this.
I agree that the explicit construction of the projection_matrix
in SPORF is definitely fastest (since it's matrix mult), and under-the-hood this should still be done more or less, but rn I don't see a clean way to generalize without breaking this up into a patch
and transform
step in code/description. Because in Gabor-morf, you can't just sample a patch and set the weights according to a Gabor kernel to do this matrix multiplication. You need to sample a Gabor kernel and convolve that with your selected patch and then sum it up. (The convolution itself can of course be vectorized to be a matrix multiplication).
Lmk if I'm misinterpreting somewhere.
I'm not too familiar with gabor filters and skimage, but I would think that if the way you've framed it as sum(filter \conv patch)
then there exists (not saying it's easy to compute) a vector of feature weights such that the summation is equivalent vector(patch) \matmul weights
which is what the projection matrix is. Since convolution is just a weighted sum of the features?
See sparse convolutional coding by Jeremias https://arxiv.org/abs/1607.02005
You can represent the convolution as a matrix multiplication
Let me know if this makes sense:
(n, height, width)
d = height * width
d' << d
(height', width')
If we want say the Gabor kernel as our "weights", the kernel is defined irrespective of image height/width. You just feed in params, and you get a kernel. To make what you're describing functionally, I think the issue is:
select a patch, say (5, 4) region on the image. You can't just weight a Gabor kernel in this region to my knowledge. Cuz if you parametrize the kernel and choose a random one, you might get a say (10, 10) kernel. What do you do now?
I think... this might only be an issue for Gabor filters, because I can imagine if we use a exponential kernel, we pre-select our patch, and we just get an exponential fall-off in weights starting at 1 in the patch. (i.e. the kernel is parametrized by the patch).
Yep fair point. So the issue is more that a patch after convolution with the gabor filter (and potentially plenty of other filters) incorporates information from features outside of that patch (since the kernel centers at a pixel but uses the neighborhood).
So pseudocode.
Hmm okay yeah that makes sense with the jank version I have rn. I guess unfortunately for some of these "spatial" filters, there's no current way that we can come up w/ that can bunch the convolution step + summation into a single matrix multiply step...
Well, give a feature vector x \in R^(1 x d)
for which a patch of height H
and width W
is being sampled post convolution and summed, that's the new feature = sum(x \conv filter)
We can represent our convolution as a matrix C \in R^(d x (HW))
and summation of HW
features is just a vector 1 \in (HW x 1)
. So our sum of convolution patch is x * C * 1
and thus C * 1 \in (d x 1)
is a row of our projection matrix. The hard part is just computing the convolution matrix elegantly, but I think Jeremias's paper does it nicely.
I think I figured it out!
My original issue was that I only had a way of obtaining the convolution was a "full" discrete convolution, which changed the shape of the output, so hard to pick the patch. Now, I think I have some code that can do "same" discrete convolution, where padding is done smartly to make the output of x \ conv filter
to be same shape as x was. The matrix multiplication operations as a result are slightly different, and it's a bit of a juggle to get it generic.
Thus now, one can do a patch pick, so the projection matrix is: P = K_conv_matrix * S
, where S is the vector doing patch-selection (1's at patch location, and 0 elsewhere). It's unfortunately a lot of super detailed Toeplitz operations, so I think the good thing is we can actually abstract this entirely into pytorch if we're willing to make that dependency! :).
https://pytorch.org/docs/stable/nn.functional.html
# sub-sampled data points
input = X.reshape(sample_inds, 2, image_height, image_width)
# kernels we're interested in (note we can generalize this to many kernels I think
# simultaneously! just modify input and output # of channels.
# here 2 channels (1 for real component and 1 for imaginary)
weight = kernel.reshape(2, 2, kernel_height, kernel_width)
# output is now a 4D tensor (batch, channels, height, width)
output_tensor = conv2d(input, weight, padding=same_padding).numpy()
# perform patch_selection
patch_weights = output_tensor[:, :, patch_idxs]
# set projection matrix with the patch weights now at the correct location
proj_mat = np.zeros((n_features, proj_dims))
proj_mat[patch_idxs, idx] = patch_weights
Otherwise we have toeplitz matrix multiplication code that's pretty complex and a pain to unit-test. In addition, if we can figure out how to smartly "parallelize" this, then we can easily perform all the convolution steps on GPU, which basically makes this step trivial computationally. The only thing to be cautious of is overhead in passing back/forth data from CPU<->GPU.
Some notes on the above topic, while I'm thinking about it:
do we pick a patch first, or apply the convolution first? The main thing is computation time of course, but I suppose if we pick a patch first, then you might introduce arbitrary edge effects(?), but if you apply convolution first, and you pick a patch incorrectly, you might not get the informative information from that convolution?
That is, either we populate the projection matrix columns, with i) multiple patches of the same type of convolution kernel applied, or ii) multiple convolutions of different kernels of the same patch. I don't think we should do random convolution for random patch, but I suppose that can also be done.
tldr, okay let's keep the same semantics because sample_proj_mat
can be implemented in Gabor-morf with just a slightly more involved algorithm then naive-morf/sporf.
I still think in say documentation and paper, we can semantically decompose the "projection matrix" for morf's into generally a "patch selection" and a "transformation" step, so people can easily grasp what's going on, but for API we can stick w/ the "projection_matrix" semantic.
I'm trying to write out the GaborMorfSplitter and I'm wondering if we might converge on a consistent API naming scheme for purposes of generalizing from a Base Oblique Splitter:
We start off with our sample data, X
(n_samples, n_dims)
. Our goal is to generate a new matrix, X' that is(n_samples, proj_dims)
. whereproj_dims
dictates the how many projections we'll consider. In my view, there are two distinct "steps" done to get our projected data:1. sampling a patch: (in the paper, this is called a projection matrix), but I feel like a patch makes more semantic sense. The patch doesn't need to even be contiguous! A patch is just defined as a sub-collection of data points (possibly random, or structured). 2. applying a transformation on said patch: In SPORF, this is linear combination, in MORF this is summation.
This then results in "one" number per data sample, per projected dimension, which can be fed into a splitting criterion across samples (i.e. Gini)
My Questions
This will then allow any MORF to inherit from a ORF easily.
For your reference if you want some more details
In SPORF
In SPORF, one is interested in i) sampling a "random patch" and then ii) computing a "transformation" on this patch.
The patch is simply dictated by the sparsity of the projection matrix and is generated randomly. The transformation is then a linear combination of the selected patch.
In naive-MORF
One also generates a patch and then computes a transformation on said patch.
The patch now is dictated by some contiguous patch on the image, and the size/location is generated randomly. The transformation is then the summation of the selected patch.
In Gabor-MORF
Same thing, where I currently: