LandSciTech / pfocal

Fast parallel convolution in R.
https://landscitech.github.io/pfocal/
Other
2 stars 1 forks source link

comparing raster::focal pad options to pfocal #26

Open see24 opened 2 years ago

see24 commented 2 years ago

There is one option in raster::focal that I can't replicate in pfocal where pad = FALSE and na.rm = TRUE. This option gives NA for cells near the edge (ie there is no padding) but allows NAs in the raster to be ignored.

Here are some plots showing the implications of the various options

library(raster)
library(pfocal)
library(tmap) # better default NA palette

mat <- matrix(c(rep(1,5000), rep(2, 5000)), nrow = 100, ncol = 100)
exps <- raster(mat, xmn = 0, xmx = 100, ymn = 0, ymx = 100)

# add an NA in the middle
exps[25,25] <- NA 

# raster options for dealing with edge effects #=============
rast_padF_narmF <- focal(exps, gaussian_kernel_radius(3), 
                   pad = FALSE, na.rm = FALSE)
qtm(rast_padF_narmF)

# this is the one I can't replicate with pfocal ***
rast_padF_narmT <- focal(exps, gaussian_kernel_radius(3), 
                         pad = FALSE, na.rm = TRUE)
qtm(rast_padF_narmT)
# the area around the NA value gets a lower value because the default function
# is sum
qtm(rast_padF_narmT < 0.999)

# this is the same as rast_padF_narmF
rast_padT_narmF <- focal(exps, gaussian_kernel_radius(3), 
                   pad = TRUE, na.rm = F, padValue = NA )

qtm(rast_padT_narmF)

# this has a larger area that is not NA
rast_padT_narmT <- focal(exps, gaussian_kernel_radius(3), 
                         pad = TRUE, na.rm = T, padValue = NA )

qtm(rast_padT_narmT)

# pfocal versions #======================

pfoc_evNA_narmNA <- pfocal(exps, gaussian_kernel_radius(3),
                           edge_value = NA, na.rm = NA)
qtm(pfoc_evNA_narmNA)

# seems to be the same as na.rm = NA for this case
pfoc_evNA_narmF <- pfocal(exps, gaussian_kernel_radius(3),
                           edge_value = NA, na.rm = FALSE)
qtm(pfoc_evNA_narmF)

# same as rast_padT_narmT
pfoc_evNA_narmT <- pfocal(exps, gaussian_kernel_radius(3),
                           edge_value = NA, na.rm = TRUE)
qtm(pfoc_evNA_narmT)
VLucet commented 2 years ago

Yup, I see the issue. Thanks so much for the thorough reprex. I also agree with the fact that ideally pfocal should be able to cover the exact same cases than raster::focal. It looks like this would require to dabble into the C code. Quick question: in which applications would we like a behavior such as only the NAs inside the rasters would be ignored?

see24 commented 2 years ago

It is the default in caribouMetrics when we use focal, but in that case I believe all the inputs are forced to be not NA any ways so it shouldn't matter. I think there are some cases where for example you might have photo interpreted data and the is an NA pixel because of cloud or something you probably wouldn't want the whole window to end up as NA but you might not want to include values outside the raster data

VLucet commented 2 years ago

Coming back on this, it looks like we'd need a way for the C code to differentiate the edge values from the non-edge values. Unfortunately I do not have a good enough grasp of the C code to make that change.

see24 commented 2 years ago

I think the solution would be to set all cells that are NA inside the raster to 0 before passing the raster to the C code. you can do that using raster::reclassify(rast, cbind(NA, 0)) For the caribouMetrics case I believe I already do that outside pfocal so it is fine but if you want to match the raster::focal option that is how I would do it. But I suppose imputing NA values is it's own separate thing and it is probably best to make people sort that out explicitly on their own

VLucet commented 2 years ago

I dont see how setting the cells to 0 would succeed. It would be counted as a cell and would not be ignored, which is the behavior we are looking for. For functions like mean, it wont match the behavior of raster focal.

see24 commented 2 years ago

Hmm yes you are right. The challenge is with raster::focal it is way faster to use sum and then set the weights matrix to add to one. So in that context ignoring NAs doesn't take them out of the denominator. So that was what I was thinking of but you are right.

xlirate commented 2 years ago

@see24 Are you still interested in having this feature? I am taking a crack at making the C++ in pfocal easier to work with, and will have the chance to make adjustments while I am in here. I am the programmer that wrote it in the first place, but I am not really a researcher in the field, so if you have time to educate me on your use-case I can see about how to make it work.

see24 commented 2 years ago

@xlirate I think it would be somewhat helpful but only in fairly marginal cases so I would check with Josie when she is back to see if she thinks it is worth the effort. It depends how much we value being able to match the behaviour of raster::focal. I don't have a particular biological use case in mind only just that sometimes values that are NA within the raster should be treated differently than values that are outside the raster and raster::focal seems to make that possible.

If you do look into adding this I would also check out the behaviour of terra::focal which is an updated version of the raster package. It has an na.policy argument as well as the na.rm argument. I am working with a different researcher this week and next so I don't have time to look into it too much but if you do I think it could be helpful.

Also note this warning in the raster::focal documentation for when setting na.rm is recommended: na.rm: logical. If TRUE, NA will be removed from focal computations. The result will only be NA if all focal cells are NA. Except for some special cases (weights of 1, functions like min, max, mean), using na.rm=TRUE may not be a good idea in this function because it can unbalance the effect of the weights

xlirate commented 2 years ago

it can unbalance the effect of the weights

We have something to offset this issue implemented using mean_divider.

Using "KERNEL_SUM" will divide the value at every point by the sum of the kernel's weights, calculated globally.

Using "DYNAMIC_SUM" will divide the value at a point by the sum of the kernel values used at that point, calculated based only on the kernel values used at that point, specifically excluding any that are omitted due to NAs

All of the dynamic divisors will be slower because they recalculate at every point instead of calculating once at the start, but they happen in the C++ and in parallel so they are still worth considering.

see24 commented 2 years ago

Ok cool! I think that would be great then