Open see24 opened 2 years ago
Yup, I see the issue. Thanks so much for the thorough reprex. I also agree with the fact that ideally pfocal
should be able to cover the exact same cases than raster::focal
.
It looks like this would require to dabble into the C code. Quick question: in which applications would we like a behavior such as only the NAs inside the rasters would be ignored?
It is the default in caribouMetrics when we use focal, but in that case I believe all the inputs are forced to be not NA any ways so it shouldn't matter. I think there are some cases where for example you might have photo interpreted data and the is an NA pixel because of cloud or something you probably wouldn't want the whole window to end up as NA but you might not want to include values outside the raster data
Coming back on this, it looks like we'd need a way for the C code to differentiate the edge values from the non-edge values. Unfortunately I do not have a good enough grasp of the C code to make that change.
I think the solution would be to set all cells that are NA inside the raster to 0 before passing the raster to the C code. you can do that using raster::reclassify(rast, cbind(NA, 0))
For the caribouMetrics case I believe I already do that outside pfocal so it is fine but if you want to match the raster::focal option that is how I would do it.
But I suppose imputing NA values is it's own separate thing and it is probably best to make people sort that out explicitly on their own
I dont see how setting the cells to 0 would succeed. It would be counted as a cell and would not be ignored, which is the behavior we are looking for. For functions like mean, it wont match the behavior of raster focal.
Hmm yes you are right. The challenge is with raster::focal it is way faster to use sum and then set the weights matrix to add to one. So in that context ignoring NAs doesn't take them out of the denominator. So that was what I was thinking of but you are right.
@see24 Are you still interested in having this feature? I am taking a crack at making the C++ in pfocal easier to work with, and will have the chance to make adjustments while I am in here. I am the programmer that wrote it in the first place, but I am not really a researcher in the field, so if you have time to educate me on your use-case I can see about how to make it work.
@xlirate I think it would be somewhat helpful but only in fairly marginal cases so I would check with Josie when she is back to see if she thinks it is worth the effort. It depends how much we value being able to match the behaviour of raster::focal. I don't have a particular biological use case in mind only just that sometimes values that are NA within the raster should be treated differently than values that are outside the raster and raster::focal seems to make that possible.
If you do look into adding this I would also check out the behaviour of terra::focal which is an updated version of the raster package. It has an na.policy argument as well as the na.rm argument. I am working with a different researcher this week and next so I don't have time to look into it too much but if you do I think it could be helpful.
Also note this warning in the raster::focal documentation for when setting na.rm is recommended: na.rm: logical. If TRUE, NA will be removed from focal computations. The result will only be NA if all focal cells are NA. Except for some special cases (weights of 1, functions like min, max, mean), using na.rm=TRUE may not be a good idea in this function because it can unbalance the effect of the weights
it can unbalance the effect of the weights
We have something to offset this issue implemented using mean_divider
.
Using "KERNEL_SUM" will divide the value at every point by the sum of the kernel's weights, calculated globally.
Using "DYNAMIC_SUM" will divide the value at a point by the sum of the kernel values used at that point, calculated based only on the kernel values used at that point, specifically excluding any that are omitted due to NAs
All of the dynamic divisors will be slower because they recalculate at every point instead of calculating once at the start, but they happen in the C++ and in parallel so they are still worth considering.
Ok cool! I think that would be great then
There is one option in raster::focal that I can't replicate in pfocal where
pad = FALSE
andna.rm = TRUE
. This option gives NA for cells near the edge (ie there is no padding) but allows NAs in the raster to be ignored.Here are some plots showing the implications of the various options