add parallelization functionality to `spExtractPoly`

joshyam-k commented 9 months ago

Here's a reproducible example that just shows (at least in this example) that by parallelizing we don't change anything about the actual output from the function, we just change how we get there. I should also note that since the dataset is only about 50 rows here, the parallelization method is actually a tick slower as we'd expect.

devtools::install_github("joshyam-k/FIESTA")
library(FIESTA)

WYplt <- FIESTA::WYplt

# Get polygon vector layer from FIESTA external data
WYbhdistfn <- system.file("extdata",
                          "sp_data/WYbighorn_districtbnd.shp",
                          package = "FIESTA")

# Extract points from polygon vector layer
xyext_parallel <- spExtractPoly(xyplt = WYplt,
                       polyvlst = WYbhdistfn,
                       xy.uniqueid = "CN",
                       spMakeSpatial_opts = list(xvar = "LON_PUBLIC",
                                                 yvar = "LAT_PUBLIC",
                                                 xy.crs = 4269),
                       ncores = 8)$spxyext
#> Using 8 cores...

xyext <- spExtractPoly(xyplt = WYplt,
                       polyvlst = WYbhdistfn,
                       xy.uniqueid = "CN",
                       spMakeSpatial_opts = list(xvar = "LON_PUBLIC",
                                                 yvar = "LAT_PUBLIC",
                                                 xy.crs = 4269))$spxyext

identical(xyext, xyext_parallel)
#> [1] TRUE

ctoney commented 9 months ago

Nice. Thanks for the example. It runs correctly for me. I assume this gives substantial speedup on the 35 million points in NV?

spxyext <- spxyext[!duplicated(spxyext[[xy.uniqueid]]), ] instead of spxyext <- unique(sf::st_join(sppltx, polyv)) probably helps too even without parallel?

Looks good to merge.

joshyam-k commented 9 months ago

Above a million rows I was consistently seeing 5-10x speedups. And yes, reworking the removal of duplicate rows definitely speeds things up quite a bit even in the non parallel case.

USDAForestService / FIESTA

add parallelization functionality to `spExtractPoly` #28