Open hakimabdi opened 8 months ago
@hakimabdi, I don't have parallel processing implement at the moment, but it is something I've been interested in adding for a bit. I'll do a bit of tests this week and see if it's something I can feasibily do.
@hakimabdi, as of now parallel processing isn't directly implemented for focal
and focalCpp
in the terra
package which is what GLCMTextures
uses. I tried to do a custom implementation by breaking the raster into chunks and sending out each chunk to a core, but that doesn't seem to work on terra
objects. I have brought this up on the terra
issue about parallelization (rspatial/terra#36) so maybe somebody there will have some guidance on how it could be implemented.
Thank you for looking into this and for developing this very useful package. I'll just continue to do it the normal until a solution is developed within terra. Cheers!
On Tue, 14 Nov 2023 at 20:43, Alex @.***> wrote:
@hakimabdi https://github.com/hakimabdi, as of now parallel processing isn't directly implemented for focal and focalCpp in the terra package which is what GLCMTextures uses. I tried to do a custom implementation by breaking the raster into chunks and sending out each cunk to a core, but that doesn't seem to work on terra objects. I have brought this up on the terra issue about parallelization (rspatial/terra#36 https://github.com/rspatial/terra/issues/36) so maybe somebody there will have some guidance on how it could be implemented.
— Reply to this email directly, view it on GitHub https://github.com/ailich/GLCMTextures/issues/25#issuecomment-1811082330, or unsubscribe https://github.com/notifications/unsubscribe-auth/AASV4H4TTJ2UUKL4LMU6O7DYEPCVHAVCNFSM6AAAAAA7GPZWMGVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTQMJRGA4DEMZTGA . You are receiving this because you were mentioned.Message ID: @.***>
@hakimabdi, I've actually now got it to work using the wrap
and unwrap
functions. The performance benefit seems to increase with larger raster objects. I may try to build it in at some point, but for now you could adapt the code below. In this code, I've parallelized the texture calculations and not the quantization since the quantization needs the data from the entire raster to determine quantiles or the overall max/min values in the dataset. That being said, the reclassify step of quantize_raster
could be parallelized and the max/min of all the chunks could be calculated in parallel and then the max/min of those values calcuated to get the overall max/min.
With this method, you'll need to run the quantization first separately in serial and then calculate the textures on the quantized raster. Since the raster is already quantized be sure to set quantization
to "none" in glcm_textures
. To speed up the quantize_raster
step, you could use the "equal range" quantization method, or use the maxcell
parameter that I just added which can be used to specify a number of samples to use for "equal prob" to estimate the quantiles instead of reading in the entire dataset.
library(GLCMTextures)
#> Loading required package: terra
#> Warning: package 'terra' was built under R version 4.3.1
#> terra 1.7.47
library(future)
library(future.apply)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(microbenchmark)
#> Warning: package 'microbenchmark' was built under R version 4.3.2
# Set up
dims<- 3500
n_grey<- 16
set.seed(5)
rq<-rast(matrix(data = sample(0:(n_grey-1), size = dims^2, replace = TRUE), nrow = dims, ncol=dims))
ncores<- parallelly::availableCores(omit=1)
w<- c(3,3)
plot(rq, col=grey.colors(100))
# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(start = ceiling(seq(1, nrow(rq), length.out = ncores+1)))
breaks<- breaks %>% mutate(end=lead(start, n=1)+buffer)
breaks$end[breaks$end > nrow(rq)]<- nrow(rq)
breaks<- breaks[-nrow(breaks),]
# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
r_list[[i]]<- wrap(rq[breaks$start[i]:breaks$end[i], ,drop=FALSE])
}
glcmtextures_parallel<- function(x, ...){
wrap(glcm_textures(unwrap(x),...))
}
# In serial
ts1<- Sys.time()
serial<- glcm_textures(rq, w=w, n_levels = n_grey, quantization = "none")
#> |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|=========================================
ts2<- Sys.time()
difftime(ts2, ts1, units = "secs")
#> Time difference of 1344.308 secs
#In Parallel
tp1<- Sys.time()
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list<- future_lapply(r_list, FUN = glcmtextures_parallel, w=w, n_levels = n_grey, quantization = "none")
#> |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|========================================= |---------|---------|---------|---------|=========================================
plan(strategy = "sequential") #Go back to serial processing
res_list<- lapply(res_list, unwrap)
parallel<- do.call(terra::merge, res_list) #Merge back into single raster
tp2<- Sys.time()
difftime(tp2, tp1, units = "secs")
#> Time difference of 427.9501 secs
Created on 2023-11-15 with reprex v2.0.2
Thank you very much, @ailich , this workaround is 5x faster on my data. The quantize_raster() function was always fast for me, but the glcm_textures() was the one that took hours to complete. I'll keep using your approach going forward until something better comes around.
@hakimabdi, glad this helped speed up your processing! For now, you can use this script, but at some point I hope to add this into the function itself so it's as simple as specifying the number of cores and then this is all done automatically. You can also try using plan="multicore"
if you have Mac or Linux which implements "fork" parallelization. I only have Windows so I haven't tried it, but it's generally faster but there can be errors due to cross-contamination by shared memory among the cores, so you'd need to do a bit of testing to make sure the results come out correctly.
@ailich thanks again. I am on a Windows 10 machine. I made a custom function based on your helpful code. I hope it is of use to others who are also looking for parallelize the calculation of GLCM textures.
library(future)
library(future.apply)
library(dplyr)
library(GLCMTextures)
library(terra)
glcmParallel = function(quantized, window.size, glevels, ncores){ # Begin function
# Set up parameters
rq <- quantized
n_grey <- glevels
w <- window.size # c(3,3)
ncores <- ncores
buffer <- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(start = ceiling(seq(1, nrow(rq), length.out = ncores+1)))
breaks<- breaks %>% mutate(end=lead(start, n=1)+buffer)
breaks$end[breaks$end > nrow(rq)]<- nrow(rq)
breaks<- breaks[-nrow(breaks),]
# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
r_list[[i]]<- wrap(rq[breaks$start[i]:breaks$end[i], ,drop=FALSE])
}
glcmtextures_parallel<- function(x, ...){
wrap(glcm_textures(unwrap(x),...))
}
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list<- future_lapply(r_list, FUN = glcmtextures_parallel, w=w, n_levels = n_grey, quantization = "none")
plan(strategy = "sequential") #Go back to serial processing
res_list<- lapply(res_list, unwrap)
parallel<- do.call(terra::merge, res_list) #Merge back into single raster
names(parallel) = sub(pattern = "glcm_", x = names(parallel), replacement = paste0(w[1],"x",w[1]))
return(parallel)
#terra::writeRaster(parallel, paste0("texture_",glevels,"_",paste0(w[1],"x",w[1]),".tif"))
gc() # Clear memory
} # End function
# system.time({
# ncores<- parallelly::availableCores(omit=2)
# testParallel = glcmParallel(quantized = quantized, window.size = c(3,3), glevels = 16, ncores = ncores)
# })
@hakimabdi, I noticed there's a bit more work to be done here. I'm seeing issues stitching the chunks together when the window size is not 3x3 or when na.rm=TRUE
. There's something wrong with the calculated buffer so I'll need to look into that.
@hakimabdi, I figured out the issue. The buffer needs to be applied to the starting point of later chunks so that the first few rows have access to the cells above them for focal calculations, and then the buffer area needs to be removed before merging so that the proper version is stored in areas of overlap between the chunks.
library(GLCMTextures)
#> Loading required package: terra
#> terra 1.7.62
library(future)
library(future.apply)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
# Set up
dims<- 100
n_grey<- 16
set.seed(5)
rq<-rast(matrix(data = sample(0:(n_grey-1), size = dims^2, replace = TRUE), nrow = dims, ncol=dims))
ncores<- parallelly::availableCores(omit=1)
w<- c(7,7)
# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(start = ceiling(seq(1, nrow(rq), length.out = ncores+1)))
breaks<- breaks %>% mutate(end=lead(start, n=1)+buffer)
breaks$end[breaks$end > nrow(rq)]<- nrow(rq)
breaks<- breaks[-nrow(breaks),]
# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
r_list[[i]]<- wrap(rq[breaks$start[i]:breaks$end[i], ,drop=FALSE])
}
glcmtextures_parallel<- function(x, ...){
wrap(glcm_textures(unwrap(x),...))
}
serial<- glcm_textures(rq, w=w, n_levels = n_grey, quantization = "none", na.rm=TRUE)
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list<- future_lapply(r_list, FUN = glcmtextures_parallel, w=w, n_levels = n_grey, quantization = "none", na.rm=TRUE)
plan(strategy = "sequential") #Go back to serial processing
res_list<- lapply(res_list, unwrap)
parallel<- do.call(terra::merge, res_list) #Merge back into single raster
plot(parallel==serial) #Errors at seams of chunks
Created on 2023-12-08 with reprex v2.0.2
library(GLCMTextures)
#> Loading required package: terra
#> terra 1.7.62
library(future)
library(future.apply)
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:terra':
#>
#> intersect, union
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
combine_raster_chunks<- function(x, buffer){
for (i in 1:length(x)) {
if(i!=1){x[[i]]<- x[[i]][(buffer+1):nrow(x[[i]]), , , drop=FALSE]} #Trim top of chunk
if(i!=length(x)){x[[i]]<- x[[i]][1:(nrow(x[[i]])-buffer), , , drop=FALSE]} #Trim bottom of chunk
}
out<- do.call(merge, x)
return(out)
}
# Set up
dims<- 100
n_grey<- 16
set.seed(5)
rq<-rast(matrix(data = sample(0:(n_grey-1), size = dims^2, replace = TRUE), nrow = dims, ncol=dims))
ncores<- parallelly::availableCores(omit=1)
w<- c(7,7)
# Create breaks for chunking raster
buffer<- (w[1]-1)/2 # Buffer for focal window
breaks<- data.frame(write_start = ceiling(seq(1, nrow(rq)+1, length.out = ncores+1)))
breaks<- breaks %>% mutate(write_end=lead(write_start, n=1)-1)
breaks<- breaks %>% mutate(chunk_start=write_start - buffer)
breaks<- breaks %>% mutate(chunk_end = write_end + buffer)
breaks<- breaks[-nrow(breaks),]
breaks$chunk_end[breaks$chunk_end > nrow(rq)]<- nrow(rq)
breaks$chunk_start[breaks$chunk_start < 1]<- 1
# Put raster chunks in list
r_list<- vector(mode = "list", length = ncores)
for (i in 1:ncores) {
r_list[[i]]<- wrap(rq[breaks$chunk_start[i]:breaks$chunk_end[i], ,drop=FALSE])
}
glcmtextures_parallel<- function(x, ...){
wrap(glcm_textures(unwrap(x),...))
}
serial<- glcm_textures(rq, w=w, n_levels = n_grey, quantization = "none", na.rm=TRUE)
plan(strategy = "multisession", workers=ncores) #Set up parallel
res_list<- future_lapply(r_list, FUN = glcmtextures_parallel, w=w, n_levels = n_grey, quantization = "none", na.rm=TRUE)
plan(strategy = "sequential") #Go back to serial processing
res_list<- lapply(res_list, unwrap)
parallel<- combine_raster_chunks(res_list, buffer) #Merge back into single raster
plot(parallel==serial) #Values match exactly
Created on 2023-12-08 with reprex v2.0.2
Hi, and thank you for this wonderful package. Do you know how to parallelize or otherwise speedup both the quantization and the GLCM creation? I am working with large raster tiles and it takes a long time for the GLCMs to be created per tile.