Do you have a cuda accelerated version of `bpdndl`?

bwohlberg commented 5 years ago

From @CasperN on November 6, 2018 19:27

Hello, I'm trying out sporco but am having a hard time with the documentation. There really should be better names, they're quite the headache to decode.

I'm trying to do dictionary learning on a few million points of a few hundred dimensions. Is there a GPU accelerated way to do this in your library? Is my best option bpdndl?

Copied from original issue: bwohlberg/sporco-cuda#2

bwohlberg commented 5 years ago

Can you give some examples of the names that are difficult to decode?

If you want to do standard (non-convolutional) dictionary learning, then bpdndl is your only option at the moment. Have you checked to see whether it really is unacceptably slow with your data set? There isn't a GPU accelerated version yet, but that's on the todo list. If you need it urgently, it should be relatively straightforward to construct a GPU version by replacing NumPy arrays and functions with the corresponding ones from CuPy (see the sporco.cupy subpackage for examples of the modules for which GPU enabled versions are already available).

bwohlberg commented 5 years ago

From @CasperN on November 7, 2018 16:39

Documentation: A name such as bpdndl isn't clear to someone new to the library. While the documentation does expand the acronyms on say https://sporco.readthedocs.io/en/latest/invprob/index.html, they don't always fully expand the acronyms further down. E.g. It would be nice to have the full name of bpdn and a link on this page https://sporco.readthedocs.io/en/latest/examples/dl/bpdndl.html#example-dl-bpdndl. That said, its a pretty minor complaint considering is a young library with good maths.

Also could you explain how to interpret the verbose output? Or send a link to where these are defined? Itn Fnc DFid ℓ1 Cnstr r_X s_X ρ_X r_D s_D ρ_D

Speed On 16 cpu, bdpndl took about 20 minutes for 15 steps, 3000 points, 128 dimensional vectors. Which suggests it will be very slow for millions of pointso. The program seems to be using just 1 core a lot of the time and only occasionally using 16. I'd like to get your insight on performance and scaling with sporco

bwohlberg commented 5 years ago

Documentation

Thanks for the documentation suggestions - I'll take a look at this issue. I'm afraid there's currently no documentation on the verbose output that I can refer you to. This should certainly be addressed, but for now, with respect to the specific example you quoted:

Itn Iteration number
Fnc Functional value
DFid Data fidelity term value
ℓ1 Regularisation term value
Cnstr Measure of violation of dictionary unit norm constraint
r_X Primal residual for the X update
s_X Dual residual for the X update
ρ_X Penalty parameter for the X update
r_D Primal residual for the D update
s_D Dual residual for the D update
ρ_D Penalty parameter for the D update

Speed

That's much slower than expected. The quick test I did on my laptop with a 128 x 256 dictionary and randomly generated training data took 20 seconds for 10 iterations with 128 x 10000 training data and 220 seconds for 10 iterations with 128 x 100000 training data. My guess would be that the problem isn't set up correctly in your test code. Here is my test code as a usage example:

from __future__ import print_function
import numpy as np
from sporco.dictlrn import bpdndl

S = np.random.randn(128, 10000).astype(np.float32)
S -= np.mean(S, axis=0)

np.random.seed(12345)
D0 = np.random.randn(S.shape[0], 2*S.shape[0])

lmbda = 0.1
opt = bpdndl.BPDNDictLearn.Options({'Verbose': True, 'MaxMainIter': 10,
                      'BPDN': {'rho': 10.0*lmbda + 0.1},
                      'CMOD': {'rho': S.shape[1] / 1e3}})

d = bpdndl.BPDNDictLearn(D0, S, lmbda, opt)
D1 = d.solve()
print("BPDNDictLearn solve time: %.2fs" % d.timer.elapsed('solve'))

CasperN commented 5 years ago

The speed thing was a bug on my end. It turns out I was using all but the first 3000 points (1.5 million) rather than just the first 3000. Thanks for the explanation!

bwohlberg commented 5 years ago

Good. I assume, then, that the existing code is of acceptable speed for your dataset? If you're using an Intel processor, you might be able to obtain at least some speed improvement by installing the MKL versions of NumPy and SciPy via Anaconda.

CasperN commented 5 years ago

Yep it seems fast enough by my testing. I didn't test MKL very thoroughly but it didn't noticeably help.

n       d       it      sec
------  -----   -----   -------------------------
10000   128     100     80
50000   128     100     210 (212 with intel MKL)
10000   8192    100     833

I'm going to try a grid search over the l1 penalty term lmbda = np.logspace(1, 2, 20) and number of dictionary words as a function of dimension (d * np.logspace(-2, 2, 20, base=2)).astype(np.int32) to optimize for some compromise between reconstruction error and sparsity. I think I'll look at ||Dx - s ||_1 or ||Dx - s||_2^2 as proxies for reconstruction error, and |x|_0 as a proxy for sparsity. Do you have any thoughts on this procedure or any metrics that you recommend I look at?

bwohlberg commented 5 years ago

If you're going to do a parameter search. I would suggest doing it in parallel (you mentioned that you have a machine with 16 cores) using sporco.util.grid_search. I'm not sure it's worth doing such a fine search on the dictionary size; I would start with something like (d * np.logspace(0, 3, num=4, base=2)).astype(np.int32). The best metric to use for parameter selection really depends on your application, but I can try to make some suggestions if you give some more details.

CasperN commented 5 years ago

I actually have access to a slurm cluster! So I'll use a job array.

I'm working on feature extraction for satellite images of clouds by first reducing the dimensions with a convolutional auto-encoder, then sparse coding that reduced representation. This method is inspired by https://gabgoh.github.io/ThoughtVectors/. The end goal is a method for representing clouds that's fully data driven and reasonably human interpretable.

Its a bit of a black box within a black box, but looking at the cloud images that correspond to the codewords (either maximum coefficient or cosine similarity), the codewords do seem to find important textures and pattern associated with clouds.

Thinking about this further, a better metric for reconstruction loss would be to look at the difference between the original image, decode(encode(image)), and decode(sparse_reconstruct(D, encode(image))) with the same image loss I use to train the auto-encoder. This would work with the 8192 length vectors but not the 128s. Basically, the auto-encoder reduces the dimensions of the input images to (8, 8, 128), the 128s are generated by averaging over the spatial dimensions (which obviously throws away lots of information) while the 8192s are generated by flattening.

bwohlberg commented 5 years ago

It sounds as if you have this under control, so I'm going to go ahead and close this issue. Feel free to reopen it if you have any further issues.

bwohlberg / sporco

Do you have a cuda accelerated version of `bpdndl`? #4

Documentation

Speed