Profile Canopy Height model

bw4sz commented 5 years ago

https://github.com/brycefrank/pyfor/blob/d2bb3769133945f4fa032bc8bb4500a903236821/pyfor/cloud.py#L304

I've been trying to increase performance of the canopy height model. I had assumed that it was the cell size resolution that led to slow performance on big data sets. This was incorrect.

Here is a toy example. Here is my loading function, but it shouldn't matter.

def load_lidar(laz_path):
    try:
        pc=pyfor.cloud.Cloud(laz_path)
        pc.extension=".las"    
    except FileNotFoundError:
        print("Failed loading path: %s" %(laz_path))

    #normalize and filter
    zhang_filter=pyfor.ground_filter.Zhang2003(pc, cell_size=1)
    zhang_filter.normalize()    

    #Quick filter for unreasonable points.
    pc.filter(min = -5, max = pc.data.points.z.quantile(0.995), dim = "z")    

    #Check dim
    assert (not pc.data.points.shape[0] == 0), "Lidar tile is empty!"

    return pc

import cProfile, pstats
cp = cProfile.Profile()

from DeepForest import Lidar

cp.enable()
lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 11)
cp.disable()

         5226 function calls (5150 primitive calls) in 0.064 seconds

   Ordered by: cumulative time, call count
   List reduced from 517 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.064    0.064 cloud.py:304(chm)
        1    0.000    0.000    0.049    0.049 cloud.py:157(grid)
        1    0.000    0.000    0.049    0.049 rasterizer.py:18(__init__)
        2    0.000    0.000    0.047    0.024 frame.py:3105(__setitem__)
        2    0.000    0.000    0.047    0.023 frame.py:3182(_set_item)
        2    0.000    0.000    0.046    0.023 generic.py:2633(_check_setitem_copy)
        1    0.046    0.046    0.046    0.046 {built-in method gc.collect}
        1    0.000    0.000    0.014    0.014 rasterizer.py:48(raster)

Add the pit filter and interpolation.

import cProfile, pstats
cp = cProfile.Profile()

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 11)
cp.disable()

This behavior has changed from < 0.3.1, points are now binned from the top left of the point cloud instead of the bottom right to cohere with arrays produced later.

         203631 function calls (201530 primitive calls) in 1.750 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.753    1.753 cloud.py:304(chm)
        1    0.000    0.000    1.529    1.529 rasterizer.py:244(pit_filter)
    162/1    0.002    0.000    0.796    0.796 <frozen importlib._bootstrap>:966(_find_and_load)
    162/1    0.001    0.000    0.796    0.796 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
    153/1    0.002    0.000    0.796    0.796 <frozen importlib._bootstrap>:651(_load_unlocked)
    123/1    0.001    0.000    0.795    0.795 <frozen importlib._bootstrap_external>:672(exec_module)
    219/1    0.000    0.000    0.795    0.795 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    336/1    0.034    0.000    0.795    0.795 {built-in method builtins.exec}
        1    0.046    0.046    0.795    0.795 __init__.py:308(<module>)
        1    0.000    0.000    0.733    0.733 signaltools.py:854(medfilt)
        1    0.733    0.733    0.733    0.733 {built-in method scipy.signal.sigtools._order_filterND}
  890/344    0.001    0.000    0.661    0.002 <frozen importlib._bootstrap>:997(_handle_fromlist)
        1    0.000    0.000    0.351    0.351 filter_design.py:2(<module>)
        1    0.032    0.032    0.350    0.350 __init__.py:266(<module>)
        1    0.004    0.004    0.272    0.272 _peak_finding.py:3(<module>)
        1    0.020    0.020    0.267    0.267 __init__.py:342(<module>)
        1    0.030    0.030    0.243    0.243 _minimize.py:8(<module>)
        1    0.004    0.004    0.217    0.217 stats.py:156(<module>)

0.064 / 1.17 = 18x drop.

My real data is 360MB 1km tile. I can report general times for that, but i stopped. I think its non-linear, the cell size = 0.1 takes about 150 seconds without pit filter, so that would be 45 minutes. I waited that long and it was still going. If you give me a push I can fork and play around. My gut feeling is this isn't right, clearly there is a cost of passing a kernel over the array, but 18x feels off. What do you think is the bottleneck here?

bw4sz commented 5 years ago

Its a function of kernel size, but counter intuitively it increases with kernel size. I would have anticipated the opposite.

import cProfile, pstats
cp = cProfile.Profile()

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 21)
cp.disable()

         203631 function calls (201530 primitive calls) in 3.758 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    3.760    3.760 cloud.py:304(chm)
        1    0.000    0.000    3.542    3.542 rasterizer.py:244(pit_filter)
        1    0.000    0.000    2.728    2.728 signaltools.py:854(medfilt)
        1    2.728    2.728    2.728    2.728 {built-in method scipy.signal.sigtools._order_filterND}

Versus

from DeepForest import Lidar

lidar_tile=Lidar.load_lidar("/Users/ben/Documents/DeepLidar/data/SJER/SJER_002.laz")

cp.enable()
chm = lidar_tile.chm(cell_size = 0.1 , interp_method = "nearest", pit_filter = "median", kernel_size = 3)
cp.disable()

         203631 function calls (201530 primitive calls) in 1.117 seconds

   Ordered by: cumulative time, call count
   List reduced from 1184 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    1.120    1.120 cloud.py:304(chm)
        1    0.000    0.000    0.903    0.903 rasterizer.py:244(pit_filter)
    162/1    0.003    0.000    0.815    0.815 <frozen importlib._bootstrap>:966(_find_and_load)
    162/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap>:936(_find_and_load_unlocked)
    153/1    0.002    0.000    0.815    0.815 <frozen importlib._bootstrap>:651(_load_unlocked)
    123/1    0.001    0.000    0.815    0.815 <frozen importlib._bootstrap_external>:672(exec_module)
    219/1    0.000    0.000    0.815    0.815 <frozen importlib._bootstrap>:211(_call_with_frames_removed)
    336/1    0.036    0.000    0.815    0.815 {built-in method builtins.exec}
        1    0.047    0.047    0.815    0.815 __init__.py:308(<module>)

brycefrank commented 5 years ago

Interesting find! This is something I have run into as well (not the kernel size differences per-se, but the performance in general of .chm with interpolation). My best guess without getting into the details is the griddata call in tandem with the "nearest" interpolation setting:

https://github.com/brycefrank/pyfor/blob/d2bb3769133945f4fa032bc8bb4500a903236821/pyfor/rasterizer.py#L101-L102

I dug down into the weeds a few months ago, and griddata turns out to be a rather complex guy. I think even at one point a delaunay triagulation is constructed:

https://stackoverflow.com/questions/20915502/speedup-scipy-griddata-for-multiple-interpolations-between-two-irregular-grids

Here is my suggestion for getting started for now:

You can easily rip out the array of your chm before you do any computations.

chm = lidar_tile.chm(0.1)
array = chm.array

Once you have this array you are free to do with it as you please. I have not looked into alternatives for filling nans or filtering pits outside of what the defaults are (although I don't think the median filter is the bottleneck here). I am sure there are a few papers out there, and a long-term plan I have is to write a few of these nan/pit filters in cython that are a bit more efficient than griddata, but my priority these days is finishing off the collection module.

If you come up with something you are happy with, reset the attribute

chm.array  = your_filtered_array

Another thing to try is setting the interp_method to "linear" but make sure to check the results. I think it runs just a bit faster.

bw4sz commented 5 years ago

I think i'm one to something? Might be the wrong function here.

https://github.com/brycefrank/pyfor/blob/d2bb3769133945f4fa032bc8bb4500a903236821/pyfor/rasterizer.py#L251

import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
a = sp.signal.medfilt(X,3)
#b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)

         27 function calls in 2.687 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    2.687    2.687 signaltools.py:854(medfilt)
        1    2.686    2.686    2.686    2.686 {built-in method scipy.signal.sigtools._order_filterND}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:3163(product)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:2478(prod)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:64(_wrapreduction)
        1    0.000    0.000    0.000    0.000 {method 'reduce' of 'numpy.ufunc' objects}
        1    0.000    0.000    0.000    0.000 fromnumeric.py:404(repeat)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:49(_wrapfunc)
        1    0.000    0.000    0.000    0.000 fromnumeric.py:36(_wrapit)
        1    0.000    0.000    0.000    0.000 shape_base.py:11(atleast_1d)

import scipy as sp
import numpy as np
from scipy.signal import medfilt2d

X = np.random.random((2000, 2000)) # sample 2D array

import cProfile, pstats
cp = cProfile.Profile()

cp.enable()
#a = sp.signal.medfilt(X,3)
b = medfilt2d(X,[3,3])
cp.disable()

stats = pstats.Stats(cp)
stats.strip_dirs()
stats.sort_stats('cumulative', 'calls')
stats.print_stats(40)

         7 function calls in 0.487 seconds

   Ordered by: cumulative time, call count

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000    0.487    0.487 signaltools.py:1131(medfilt2d)
        1    0.486    0.486    0.486    0.486 {built-in method scipy.signal.sigtools._medfilt2d}
        2    0.000    0.000    0.000    0.000 numeric.py:433(asarray)
        2    0.000    0.000    0.000    0.000 {built-in method numpy.core.multiarray.array}
        1    0.000    0.000    0.000    0.000 {method 'disable' of '_lsprof.Profiler' objects}

5x bump using medfilt2d versus medfilt. I think because assumptions of a square window?

bw4sz commented 5 years ago

I'm not the first here. https://gist.github.com/f0k/2f8402e4dfb6974bfcf1

brycefrank commented 5 years ago

Very nice! With regard to medfilt then, would you be comfortable making a PR on 0.3.2 branch? You have been making a lot of contributions and it may be nice to count toward that via commits if you would like. Should be pretty easy!

bw4sz commented 5 years ago

Done. Looking at those performance stats. I really don't see any calls to griddata, to me it 100% looks like the median filtering.

brycefrank commented 5 years ago

I think this is due to the tile size, in your example you use SJER_002.laz. If I recall correctly this is relatively small "plot-level" tile. The griddata bottleneck occurs for larger, landscape-level tiles (1km x 1km for example).

bw4sz commented 5 years ago

I take it back, looking at the full stack. There is definitely ongoing performance challenges with the .griddata, taking a peek into it.

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy

(9999, 9999)
         1053282 function calls (1050067 primitive calls) in 4618.410 seconds

   Ordered by: cumulative time, call count
   List reduced from 1106 to 40 due to restriction <40>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.129    0.129 4618.410 4618.410 Lidar.py:67(load_lidar)
        2    0.104    0.052 4521.924 2260.962 rasterizer.py:78(interpolate)
        1    0.160    0.160 4521.662 4521.662 cloud.py:304(chm)
        3    0.256    0.085 4501.981 1500.660 ndgriddata.py:88(griddata)
        3 4356.559 1452.186 4357.653 1452.551 ndgriddata.py:58(__init__)
        3    2.839    0.946  144.072   48.024 ndgriddata.py:67(__call__)
        3  138.584   46.195  138.585   46.195 {method 'query' of 'scipy.spatial.ckdtree.cKDTree' objects}
        1    0.044    0.044   51.709   51.709 cloud.py:68(__init__)

brycefrank commented 5 years ago

There it is. Ben, I have good time to work on pyfor today, I will be around on gitter if you need quick feedback. I plan on tackling the median filter test first, and then move on to 0.3.2 tasks.

brycefrank / pyfor

Profile Canopy Height model #31