geocompx / geocompy

Geocomputation with Python: an open source book and online resource for getting started in this space
https://py.geocompx.org/
Other
259 stars 47 forks source link

Faster raster to point conversion #203

Closed anitagraser closed 9 months ago

anitagraser commented 10 months ago

rasterio.features.shape is an insanely expensive operation for something as simple as raster to point conversion:

rasterio.features.shapes(source, mask=None, connectivity=4, transform=(1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0)) Get shapes and values of connected regions in a dataset or array.

It would be good if we could identify a more efficient approach for https://py.geocompx.org/05-raster-vector#sec-raster-to-points

michaeldorman commented 10 months ago

Good point, I agree. What do you think about the alternative of

  1. using rasterio.transform.xy to generate the x and y of of all pixels (as shown here), and then
  2. transform to shapely points and attache the values?
anitagraser commented 10 months ago

It's worth a try (and a comparison with gdal2xzy.py)

anitagraser commented 10 months ago

Mhm, does rasterio sample accept the output of rasterio.transform.xy?

michaeldorman commented 10 months ago

To sample the raster values? I was thinking of just attaching them from the flattened array

anitagraser commented 10 months ago

You're right. That should be fast

michaeldorman commented 10 months ago

Here is my suggestion:

height = src_elev.shape[0]
width = src_elev.shape[1]
cols, rows = np.meshgrid(np.arange(width), np.arange(height))
x, y = rasterio.transform.xy(src_elev.transform, rows, cols)
pd.DataFrame({
    'x': np.array(x).flatten(),
    'y': np.array(y).flatten(),
    'z': src_elev.read(1).flatten()
})

The result is:

       x     y   z
0  -1.25  1.25   1
1  -0.75  1.25   2
2  -0.25  1.25   3
3   0.25  1.25   4
4   0.75  1.25   5
..   ...   ...  ..
31 -0.75 -1.25  32
32 -0.25 -1.25  33
33  0.25 -1.25  34
34  0.75 -1.25  35
35  1.25 -1.25  36

[36 rows x 3 columns]
michaeldorman commented 10 months ago

Here is a more complete and modified suggestion:

import numpy as np
import geopandas as gpd
import rasterio

src = rasterio.open('output/elev.tif')
height = src.shape[0]
width = src.shape[1]
cols, rows = np.meshgrid(np.arange(width), np.arange(height))
x, y = rasterio.transform.xy(src.transform, rows, cols)
x = np.array(x).flatten()
y = np.array(y).flatten()
z = src.read(1).flatten()
geom = gpd.points_from_xy(x, y, crs=src.crs)
dat = gpd.GeoDataFrame(data={'value':z}, geometry=geom)
dat

The result is:

    value                   geometry
0       1   POINT (-1.25000 1.25000)
1       2   POINT (-0.75000 1.25000)
2       3   POINT (-0.25000 1.25000)
3       4    POINT (0.25000 1.25000)
4       5    POINT (0.75000 1.25000)
5       6    POINT (1.25000 1.25000)
...
michaeldorman commented 10 months ago

P.S. I've added a similar example in the Python course: https://geobgu.xyz/py/51-faq.html#raster-to-points

michaeldorman commented 10 months ago

Here is a more complete and modified suggestion:

import numpy as np
import geopandas as gpd
import rasterio

src = rasterio.open('output/elev.tif')
height = src.shape[0]
width = src.shape[1]
cols, rows = np.meshgrid(np.arange(width), np.arange(height))
x, y = rasterio.transform.xy(src.transform, rows, cols)
x = np.array(x).flatten()
y = np.array(y).flatten()
z = src.read(1).flatten()
geom = gpd.points_from_xy(x, y, crs=src.crs)
dat = gpd.GeoDataFrame(data={'value':z}, geometry=geom)
dat

The result is:

    value                   geometry
0       1   POINT (-1.25000 1.25000)
1       2   POINT (-0.75000 1.25000)
2       3   POINT (-0.25000 1.25000)
3       4    POINT (0.25000 1.25000)
4       5    POINT (0.75000 1.25000)
5       6    POINT (1.25000 1.25000)
...

@anitagraser (and everyone) - will be happy if you can please take a look, to see if we can switch to this methodology in ch05. Thanks

michaeldorman commented 10 months ago

@anitagraser , @Robinlovelace , @Nowosad will be happy if you can please take a look and confirm or suggest changes. I've finished going over @Nowosad comments, so this is the last major issue related to code we have at the moment, and since it affects several places in the book I wanted to finalize it before we go into other content-related issues (preface section, etc.) Thanks

Robinlovelace commented 10 months ago

Will take a look now..

Robinlovelace commented 10 months ago

Finally had a look, sorry for slow response.

Currently we have:

https://github.com/geocompx/geocompy/blob/5120ec247feb36ee13b3f40481f1107d1e94a1b8/05-raster-vector.qmd#L889-L894

This suggestion looks good to me, although I'm not well qualified to comment this and haven't benchmarked it

src = rasterio.open('output/elev.tif')
height = src.shape[0]
width = src.shape[1]
cols, rows = np.meshgrid(np.arange(width), np.arange(height))
x, y = rasterio.transform.xy(src.transform, rows, cols)
x = np.array(x).flatten()
y = np.array(y).flatten()
z = src.read(1).flatten()
geom = gpd.points_from_xy(x, y, crs=src.crs)
dat = gpd.GeoDataFrame(data={'value':z}, geometry=geom)
dat

So general :+1: and sorry I don't have more specific or useful comments.

michaeldorman commented 10 months ago

Finally had a look, sorry for slow response.

Currently we have:

https://github.com/geocompx/geocompy/blob/5120ec247feb36ee13b3f40481f1107d1e94a1b8/05-raster-vector.qmd#L889-L894

This suggestion looks good to me, although I'm not well qualified to comment this and haven't benchmarked it

src = rasterio.open('output/elev.tif')
height = src.shape[0]
width = src.shape[1]
cols, rows = np.meshgrid(np.arange(width), np.arange(height))
x, y = rasterio.transform.xy(src.transform, rows, cols)
x = np.array(x).flatten()
y = np.array(y).flatten()
z = src.read(1).flatten()
geom = gpd.points_from_xy(x, y, crs=src.crs)
dat = gpd.GeoDataFrame(data={'value':z}, geometry=geom)
dat

So general 👍 and sorry I don't have more specific or useful comments.

Thanks @Robinlovelace ! In the original version we're first converting each raster pixel to a polygon, and then deriving the centroid (which is the pixel centroid). This is wasteful of resources as @anitagraser has pointed out. In the new version, we're using a mathematical transformation from pixel indices to coordinates, passing all possible row/column combinations, then converting x/y to point geometries.

Nowosad commented 10 months ago

I think the new code idea is fine (as long as it is well explained).

michaeldorman commented 9 months ago

Completed https://github.com/geocompx/geocompy/commit/1aacd440f98b731af6489a21e098dded1068c462

Robinlovelace commented 9 months ago

Great work Michael, many thanks :tada: