Speedup with unique? - Githubissues

edwardbair commented 3 years ago

https://github.com/edwardbair/SPIRES/blob/c54df3974011b782e3bd541298de913d61fb5a66/run_spires.m#L89

Looking into a speedup here using unique per your suggestion, but I'm getting hung up because 1) there's no option for "OutputAllIndices" for IA ; 2) GPU acceleration doesn't work for OutputAllIndices or ByRows.

uniquetol is a built-in black box, but the source for unique is viewable. Maybe it can be modified to do what we need?

edwardbair commented 3 years ago

Using the snippet below, I get a speedup, but it's not huge,

Elapsed time is 5.901718 seconds. Elapsed time is 8.245689 seconds.

`uniquetolval=5; N=1e7; vals=repmat([30 50 90; 31 51 91; 32 52 92; 33 53 93],N,1); %% unique approach tic; uvals=round(vals/uniquetolval)*uniquetolval; [u,ia1]=unique(uvals,'rows'); [~,ia2]=unique(uvals,'rows','last'); ia=cell(size(ia1,1),1); for i=1:length(ia1) ia{i}=ia1(i):ia2(i); end toc

%uniquetol approach uniquetolval2=uniquetolval-1; %lower thresh b.c. uniquetol treats vals that are equal to tolerance as equivalent tic [ut,iat]=uniquetol(uvals,uniquetolval2,'ByRows',true,... 'DataScale',1,'OutputAllIndices',true); toc`

Timbo-Stillinger commented 3 years ago

Hmmm. Seem that keeping track of the index of each element that is unique adds a lot of overhead. Check your ia variable at the end, I dont think it actually has the correct indices within it. there are lots of duplicate indices in each cell.

I think you want something like this: (see the time diff between unique and uniquetol, that is where the speed is, but doesnt matter for us)

I think the big speed up via unique would be in the ability to search across an entire year or all years of a tile for unique rows via MATLAB tall arrays, which are compatible with unique, ismember, and find, but not uniquetol.

https://www.mathworks.com/help/matlab/tall-arrays.html https://www.mathworks.com/help/matlab/import_export/develop-custom-tall-array-algorithms.html

%% unique approach tic; uvals=round(vals/uniquetolval)*uniquetolval; u=unique(uvals,'rows'); toc numUnique=size(u,1); tic for i=1:numUnique ia{i}=find(ismember(uvals,u(i,:),'rows')); end toc

edwardbair commented 3 years ago

I see what you mean about the "ia" issue. Your approach is good, but I can't get it to run any faster and I get an extra row of unique values not produced by uniquetol (which has the correct answer). I also tried compiling as a MEX function, but it ran a little slower. Using gpuArrays seemed to cause excessive overhead, but I'm new to that, so I probably missed something.

uniquetolval=5; N=1e6; vals=repmat([23 50 90; 31 51 91; 32 52 92; 33 53 93; 34 54 94],N,1); %% timbos approach tic;

uvals=round(vals/uniquetolval)*uniquetolval; [u0,ia0]=unique(uvals,'rows'); numUnique=size(u0,1);

u=vals(ia0,:); ia=cell(numUnique,1); for i=1:numUnique ia{i}=find(ismember(uvals,u0(i,:),'rows')); end

fprintf('timbos approach %g sec\n',toc) %% uniquetol approach tic

[ut,iat]=uniquetol(vals,uniquetolval,'ByRows',true,... 'DataScale',1,'OutputAllIndices',true);

fprintf('uniquetol approach %g sec\n',toc)

edwardbair commented 3 years ago

@DozierJeff Thanks for weighing in. I copied your email below. Right now, uniquetol is taking almost as long as the unmixing. I think we can really speed this up, but maybe it’s not with the unique /uniquetol functions?

——————- On my machine (Lenovo, several years old) running MATLAB 2021a, uniquetol takes 2.6 sec, Timbo’s takes 4.6. When I ran it again, times were 1.9 and 4.0 sec (maybe some compiling took place, or maybe I didn’t move: some stuff I read about timing says, laptop on hard surface, plugged in, don’t touch).

That uniquetol doesn’t show the code might mean it’s coded in C.

Generally, I view speeds within a factor of 2 to be close enough.

Timbo-Stillinger commented 3 years ago

@DozierJeff copied you e-mail in below.

for superpixels a few things: the matlab built in superpixel function can only handle 1 or 3 features per pixel for clustering, my method enables as many features as you want, but is slow. The built is superpixels funciton is a MEX file and runs very fast. I was doing some superpixel stuff with both funcitons yesterday on Landsat images and the results are pretty dang similar. For speed, I recommend we use the matlab function. also there is a superpixels3 (3D superpixels) funciton - we could stack a few days together and find similar pixels across multiple days - probably just a few a most.

The trick with using superpixels and MODIS data is the spatial resolution of MOD09GA is quite coarse so we don't want to make the superpixels too big. I'm guessing our fSCA estimates from MODIS super pixels are going to be similar to the aggregate of the fSCA estimate from the individual pixels. But, unless all the component pixels have high fSCA we may be reducing our ability to estimate grain size and contaminat conc, by the superpixels fSCA going below the thresholds to estimate those snow properties. We'd need a quick way to find good pixels/superpixels for estimating those properties.

also, we have plenty of time. We could create a large training library of unique pixels form many tiles and days with the current "slow" combo of uniquetol + speedinvert, then train the ML and for future "fast" runs, run the statistical fit / ML approach on all pixels, which could be fast enough to just use ML on all pixels.

@edwardbair - for GPU arrays, once Aaron gets MATLAB 2021a fired up on tlun, speed might improve with the new updates to matlab handling the new NVIDIA cards.

Let’s take a step back. The problem is: unmixing every pixel takes too much time, but many pixels are similar to other pixels, so we shouldn’t have to unmix all. Rosenthal & Dozier 1996 realized this, and Walter built a regression tree based on a sample of pixels.

I’d suggest we combine the ideas with superpixels. For a typical AVIRS-NG image, I have various ways to create a 3-band image that superpixel can handle, and Timbo has identified other ways. David Thompson’s Mars stuff just used a one-band image, the brightness (Euclidean norm) of the spectral reflectance. Depending on options, I get something like 65,000 superpixels, but running uniquetol on these at a tolerance of 0.05 for the weighted spectra, I reduce to 2500-3000, which I can unmix on my laptop.

Then the question is, what to do with ‘em? One approach is to say there are only 2500 answers and for each pixel pick the closest superpixel and assign that fSCA and albedo to it. An alternative is to assert that the superpixels span the range of variability and to use a statistical fit (“machine learning”) to assign the snow properties to each pixel.

Thoughts? This topic applies to other retrievals besides snow.

Jeff

edwardbair commented 3 years ago

“Life is really simple, but we insist on making it complicated.” – Confucius

The problem is: can we speed up SPIReS? Machine learning, LUTs, GPU arrays, and other speed ups are good research directions, but I have yet to see any evidence that they work better or faster than simpler methods for clustering. Down the road, with the right fast clustering algorithm, I think we'll be able to apply a look up table answer to each scene and skip the iterative solving, but we are not there yet. I'll note that I wasn't able to do that two years ago either, but I'm hoping sometime smarter than I am will be able to (i.e., on this thread).

For now, I don't agree that unmixing takes too much time since it can be efficiently parallelized. Right now, the bottleneck is uniquetol, which cannot be run in parallel, so that's what I've focused on. Below is a summing approach below that builds on Timbo's code. Note that an extra unique row is produced by Timbo's approach and the summing approach. I believe all three are producing correct answers, but uniquetol works differently in that it tests each cell in a row to be within a tolerance, whereas Timbo's approach and the summing approach examine the entire row.

uTest timbos approach 23.087 sec summing approach 5.4753 sec uniquetol approach 11.7493 sec

uniquetolval=5; N=1e7; u0=[23 50 90; 31 51 91; 32 52 92; 33 53 93; 34 54 94]; vals=repmat(u0,N,1); %% timbos approach tic; uvals=round(vals/uniquetolval)*uniquetolval; [u,ia]=unique(uvals,'rows'); numUnique=size(u_,1);

u=vals(ia,:); ia=cell(numUnique,1); for i=1:numUnique ia{i}=find(ismember(uvals,u(i,:),'rows')); end fprintf('timbos approach %g sec\n',toc) %% summing approach tic; sv=sum(vals,2); uvals=round(sv/uniquetolval)*uniquetolval; [u,ia]=unique(uvals); u=vals(ia,:); ia=cell(size(ia)); for i=1:size(u,1) ia{i}=find(u(i)==uvals); end fprintf('summing approach %g sec\n',toc) %% uniquetol approach tic [u,ia]=uniquetol(vals,uniquetolval,'ByRows',true,... 'DataScale',1,'OutputAllIndices',true); fprintf('uniquetol approach %g sec\n',toc)

edwardbair commented 3 years ago

Here's a final note.

I wrote speedyUniqueTol and implemented it in run_spires only to find that the summing approach produces too few filtered rows. In other words, many rows sum to the same value within a tolerance. The uniquetol approach finds the spectra we are looking for since it examines each cell separately. The ismember(...,'rows') approach seems to produce the same result as summing the rows and is too slow. Using uniquetol with tlun I was able to run the entire Sierra 20 year record overnight so this may be a case where the hardware improvements dwarf our attempts at code improvements.

edwardbair / SPIRES

Speedup with unique? #3