Open Nowosad opened 9 months ago
I actually kind of like it as it's own function run_ce
because supercells
requires you specify a compactness. meta_dists
(or maybe include_distances
) though could be useful for providing additional quality metrics on your final result though and an estimate of the heterogeneity in multivariate space within each superpixel .
Also, just a note on run_ce
. We departed slightly from the original method I proposed which strictly included only distances between pixels and its closest center, which would require creating Voronoi polygons and extracting pixel values within those polygons and comparing it to the center, and instead we compared centers to all pixels within the 2S x 2S search window around them since that is what is already being done by SLIC. I think that should be good and is likely more appropriate for the method anyway.
There seems to be an issue with run_ce
and NA
values though. It seems that instead of being removed like in the output from supercells
it seems like the result is assigned a distance of 0 probably because I had the distances initialized at 0 here. I thought that if all cells were NA it would be skipped because of this code but I guess that's not the case. The code needs to be able to differentiate between false zeros (all NA's) and true zeros (all values in the search window are the same).
library(supercells)
library(raster)
#> Warning: package 'raster' was built under R version 4.3.1
#> Loading required package: sp
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol_NA = vol
vol_NA[1:50,1:50]<- NA
plot(vol_NA)
s<- supercells(vol, k = 50, compactness = 1, clean=FALSE)
s_NA<- supercells(vol_NA, k = 50, compactness = 1, clean=FALSE)
plot(st_geometry(s_NA))
cluster_dist<- run_ce(vol_NA, k = 50)
cluster_dist
#> [1] 0 0 0 0 0 53 69 42 0 0 0 0 0 71 48 68 0 0 0 0 0 38 69 59 0
#> [26] 0 0 0 0 43 69 71 0 0 0 0 0 42 59 76 22 19 21 20 43 29 33 28
length(cluster_dist)
#> [1] 48
nrow(s_NA)
#> [1] 23
nrow(s)
#> [1] 48
Created on 2024-01-31 with reprex v2.0.2
What I like about run_ce
is you can do something like this.
library(supercells)
library(raster)
#> Warning: package 'raster' was built under R version 4.3.1
#> Loading required package: sp
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
cluster_dist<- run_ce(vol, k=50)
summary(cluster_dist)
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> 5.00 27.75 36.00 39.71 49.25 76.00
v1<- supercells(vol, k=50, compactness = median(cluster_dist)) #Approximately equal weight to spatial and feature distance
v2<- supercells(vol, k=50, compactness = median(cluster_dist)/10) #Feature space weighted approximately 10 times greater than spatial distance
plot(st_geometry(v1))
plot(st_geometry(v2))
Created on 2024-01-31 with reprex v2.0.2
Hi @ailich -- I hope the code in the branch is mostly (except for NAs) fine for you. I've been working on an alternative development on this today. I found one obstacle, and I will inform you when a new version is ready.
Thanks for the data examples, and letting me know that having a new function is a better approach. Do you have any suggestions on how it should be named?
Thanks @Nowosad. Other than the NA
thing it's all good with me. I think of run_ce
as "run compactness estimation," so the name is fine. The function does need some documentation though. Perhaps something such as:
"This function returns the maximum distance in feature space between each initial cluster center and all pixels within its surrounding search window. The mean or median of these distances can be a good starting point for the testing compactness values that provide approximately equal weight to geographic and feature space distance in the supercells function for your data and chosen distance measure."
Additionally in the documentation for supercells it says " A compactness value depends on the range of input cell values and selected distance measure." Perhaps at the end of that add something such as "(see run_ce function)" to point people in that direction.
@Nowosad Hi Alex -- see the attached code. In short:
get_initial_distances()
and it returns supercells after only 1 iteration -- this way you can visualize the initial results, and also -- there should not be any problems with missing values# remotes::install_github("nowosad/supercells@comp2")
library(supercells)
# set initial params
my_step = 15
my_compactness = 15
# read data and calc initial max distances
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol_dists = get_initial_distances(vol, step = my_step, compactness = my_compactness)
vol_dists
#> Simple feature collection with 24 features and 7 fields
#> Geometry type: MULTIPOLYGON
#> Dimension: XY
#> Bounding box: xmin: 2667400 ymin: 6478705 xmax: 2668010 ymax: 6479575
#> Projected CRS: NZGD49 / New Zealand Map Grid
#> First 10 features:
#> supercells x y elevation max_value_dist max_spatial_dist
#> 1 1 2667465 6479500 95.50213 27 19.79899
#> 2 2 2667475 6479350 102.84000 37 20.51828
#> 3 3 2667465 6479190 111.66284 47 20.51828
#> 4 4 2667475 6479070 128.45963 58 20.51828
#> 5 5 2667475 6478910 127.97794 60 20.51828
#> 6 6 2667465 6478790 109.64032 80 20.51828
#> 7 7 2667605 6479500 101.16667 41 20.51828
#> 8 8 2667645 6479360 125.74419 39 21.21320
#> 9 9 2667625 6479210 141.43529 41 21.21320
#> 10 10 2667605 6479020 164.58084 58 21.21320
#> max_total_dist geometry
#> 1 2.232089 MULTIPOLYGON (((2667400 647...
#> 2 2.720294 MULTIPOLYGON (((2667540 647...
#> 3 3.350622 MULTIPOLYGON (((2667530 647...
#> 4 4.085748 MULTIPOLYGON (((2667550 647...
#> 5 4.109609 MULTIPOLYGON (((2667400 647...
#> 6 5.505956 MULTIPOLYGON (((2667400 647...
#> 7 3.035347 MULTIPOLYGON (((2667540 647...
#> 8 2.915857 MULTIPOLYGON (((2667700 647...
#> 9 3.056505 MULTIPOLYGON (((2667700 647...
#> 10 4.117173 MULTIPOLYGON (((2667670 647...
plot(vol_dists)
# perform basic calculations
spatial_part = (mean(vol_dists$max_spatial_dist) / my_step)^2
value_part = (mean(vol_dists$max_value_dist) / my_compactness)^2
sqrt(spatial_part + value_part) #total
#> [1] 3.943791
# create supercells with equal spatial/value weights
equal_parts = sqrt(value_part / spatial_part)
my_compactness2 = my_compactness * equal_parts
spatial_part2 = (mean(vol_dists$max_spatial_dist) / my_step)^2
value_part2 = (mean(vol_dists$max_value_dist) / my_compactness2)^2
vol_sp = supercells(vol, step = my_step, compactness = my_compactness2, clean = FALSE)
plot(vol_sp["elevation"])
# create supercells with values 10 times more important
my_compactness3 = my_compactness2/sqrt(10)
spatial_part3 = (mean(vol_dists$max_spatial_dist) / my_step)^2
value_part3 = (mean(vol_dists$max_value_dist) / my_compactness3)^2
vol_sp2 = supercells(vol, step = my_step, compactness = my_compactness3, clean = FALSE)
plot(vol_sp2["elevation"])
@Nowosad, since this is meant to estimate the compactness factor, it shouldn't require a specification for compactness in the function itself. That being said, you mentioned that the weighting between geographic and feature space is more complex than in my example, and am am realizing that my initial logic likely needs to be tweaked at least slightly. The initial thought process was based on the fact that since step
is used as a divisor for the geographic space to normalize it from 0-1, then a compactness factor that normalizes the feature space distance from 0-1 would accomplish equal weighting. The difficulty is that this distance cannot be determined ahead of time and will differ from cluster to cluster. The original SLIC paper mentions this and says that instead compactness can be set to a constant and used to control the weighting between geographic and feature space. My idea was that setting this constant to be the median (or mean) of the maximum distance between all pixels in the the search window and the value at the centroid would find a compactness value that would on average normalize the feature space distance to 1, therefore providing equal weighting. What I am realizing though is that this is comparing the feature space to the feature space values at the centroid in geographic space. It would probably be more appropriate to calculate the centroid in feature space across the search window and then measure the distance between each pixel with respect to that.
Alternatively, like you suggested earlier, perhaps a meta_dists
argument could be used and return some additional distances that could be used to assess the result. For example, the maximum geographic distance between polygon nodes and the final centroids could be returned as a column as well as the maximum feature space distance of each pixel in the cluster from the centroid of that cluster in feature space. Then the geographic distance/step can be compared to the value of feature space distance/compactness to see what the weighting is in the map. Rather than estimating this prior to running the tool to get a starting point for compactness, this would assess the final result and see if the weightings are reasonable or if you feel that maybe you should change your compactness since you are weighting one too heavily.
Hi @ailich:
I think there are three topics to discuss.
since this is meant to estimate the compactness factor, it shouldn't require a specification for compactness in the function itself.
There are three options I can see:
Please let me know what the best option is for you.
It would probably be more appropriate to calculate the centroid in feature space across the search window and then measure the distance between each pixel with respect to that.
That's an interesting idea! But I cannot think of how to implement it without a major effort. I do not have the capacity for that, but please let me know if you want to try to write something like this...
Alternatively, like you suggested earlier, perhaps a meta_dists argument could be used and return some additional distances that could be used to assess the result.
I dropped this idea (at least for now). Why? It only works with clean = FALSE
. If cleaning is on, then the connections between the clusters are improved, but the distances are not recalculated. Thus, I would need to write another code layer to recalculate it. I do not think if this is worth it.
Sorry for the late reply. Had a couple other things I was working on and wanted to make sure I had the time to create a thoughtful reply.
Just some background as to what the goal is. Based on the paper that proposed the method, "The maximum spatial distance expected within a given cluster should correspond to the sampling interval (step). Determining the maximum color Nc distance is not so straightforward, as color distances can vary significantly from cluster to cluster and image to image. This problem can be avoided by fixing Nc to a constant m (compactness)." The goal is therefore to have a function that tells you how much "m" (the compactness) is weighting feature space relative to geographic space.
Probably the most informative would be an sf object with the corresponding distance columns, but as mentioned in a previous comment, it's actually probably more appropriate to compare to the centroid in feature space rather than the feature values of the point located at the centroid in geographic space.
Here's some code to essentially accomplish that. The slight difference is that the distance is calculated in sf so it's using geographic space rather than number of pixels so it may be different if the x and y raster resolution are unequal. Also, it's coded in R and may be redoing calculations already done by supercells so it's not necessarily efficient.
The procedure here though is
iter=0
to get initial cluster centers. compactness
needs to be specified to get the function to run but with iter=0
it has no impact on the results.#############
library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71
library(supercells)
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
library(tidyverse)
# Create a sample data frame with points
st_voronoi_point <- function(points){
## points must be POINT geometry
# check for point geometry and execute if true
if(!all(st_geometry_type(points) == "POINT")){
stop("Input not POINT geometries")
}
g = st_combine(st_geometry(points)) # make multipoint
v = st_voronoi(g)
v = st_collection_extract(v)
return(v[unlist(st_intersects(points, v))])
} # https://gis.stackexchange.com/questions/362134/i-want-to-create-a-voronoi-diagram-while-retaining-the-data-in-the-data-frame
# Load data
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol<- c(vol, terrain(vol, "slope")) #add second layer
bbox<- st_as_sf(vect(ext(vol))) #extent
distance_method<- "euclidean"
initial_centers<- supercells(vol, step=10, compactness = 1, iter=0) #compactness does not affect initial centers
voronoi_polygons <- st_voronoi_point(initial_centers) #create polygons around initial centers
voronoi_polygons<- st_crop(voronoi_polygons, bbox)
# Plot the Voronoi polygons
plot(st_geometry(voronoi_polygons), main = "Voronoi Polygons")
plot(initial_centers, add = TRUE, pch = 20, col = "red")
vals<- terra::extract(vol, vect(voronoi_polygons)) # Extract values within each polygon
feature_centroids<- vals %>% group_by(ID) %>% summarize(across(everything(), function(x) median(x, na.rm = TRUE))) #Centroids in feature space per polygon
max_feature_dist<- rep(NA_real_, length(voronoi_polygons)) # maximum distance in feature space within a given polygon
for (i in 1:nrow(vals)) {
ID<- vals$ID[i]
curr_vals<- vals[i,-1]
if(any(is.na(curr_vals))){next()}
curr_dist<- dist(rbind(curr_vals, feature_centroids[ID,-1]), method = distance_method) #distance from centroid in feature space
if(is.na(max_feature_dist[ID]) | curr_dist > max_feature_dist[ID]){
max_feature_dist[ID]<- curr_dist #Update distance
}
}
s1<- supercells(vol, step=10, compactness = median(max_feature_dist), iter=10)
plot(st_geometry(s1))
Created on 2024-02-28 with reprex v2.0.2
The code from topic 2 could be edited and used on clean=TRUE
. I'd need to just add some lines to calculate the distance from each node in the polygon to the center of the polygon. This could be used afterwards to evaluate how compactness affected the final result by directly comparing the maximum geographic distance between pixels and the their corresponding cluster center in geographic space, and then looking at the maximum distance between pixels and the their corresponding cluster center in feature space.
Hi @ailich -- thanks for detailed message. I will try this approach in the next few days and will give my feedback.
(One thing -- the code probably would need to use some custom dist
function, as the package allows for all distances from the philentropy package (and a few more) plus custom ones)
Hi @ailich -- I check the code and it makes sense (I think). Feel free to wrap it up as a function: it would be great if you could: a) avoid adding new dependencies (e.g., dplyr), b) use distance function from philentropy instead of dist.
Best, Jakub
@Nowosad, I wrapped the previous code in a function called est_fsd
for estimating feature space distance. I use philentropy
for distance calculations now and removed the need for dplyr
. How can I call your distance functions so I can add those in?
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71
library(supercells)
library(philentropy)
#> Warning: package 'philentropy' was built under R version 4.3.3
#>
#> Attaching package: 'philentropy'
#> The following object is masked from 'package:terra':
#>
#> distance
# Create a sample data frame with points
st_voronoi_point <- function(points){
## points must be POINT geometry
# check for point geometry and execute if true
if(!all(st_geometry_type(points) == "POINT")){
stop("Input not POINT geometries")
}
g = st_combine(st_geometry(points)) # make multipoint
v = st_voronoi(g)
v = st_collection_extract(v)
return(v[unlist(st_intersects(points, v))])
} # https://gis.stackexchange.com/questions/362134/i-want-to-create-a-voronoi-diagram-while-retaining-the-data-in-the-data-fram
est_fsd<- function(x, k, dist_fun="euclidean", avg_fun="mean", step, sf=TRUE){
bbox<- st_as_sf(vect(ext(x))) #extent
initial_centers<- supercells(x, k=k,step=step, compactness = 1, iter=0) #compactness does not affect initial centers
voronoi_polygons <- st_voronoi_point(initial_centers) #create polygons around initial centers
voronoi_polygons<- st_crop(voronoi_polygons, bbox)
vals<- terra::extract(x, vect(voronoi_polygons)) # Extract values within each polygon
unique_IDs<- unique(vals$ID)
feature_centroids<- as.data.frame(matrix(data=NA_real_, nrow = length(unique_IDs), ncol = ncol(vals)))
names(feature_centroids)<- names(vals)
feature_centroids$ID<- unique_IDs
for (i in 1:length(unique_IDs)) {
curr_vals<- vals[vals$ID==unique_IDs[i],] #subset rows to polygon ID
feature_centroids[i,-1]<- sapply(curr_vals[,-1], avg_fun, na.rm=TRUE)
}
rm(curr_vals,i)
max_feature_dist<- rep(NA_real_, length(voronoi_polygons)) # maximum distance in feature space within a given polygon
for (i in 1:nrow(vals)) {
ID<- vals$ID[i]
curr_vals<- vals[i,-1]
if(any(is.na(curr_vals))){next()}
curr_dist<- philentropy::distance(rbind(curr_vals, feature_centroids[ID,-1]), method = dist_fun, mute.message = TRUE) #distance from centroid in feature space
if(is.na(max_feature_dist[ID]) | curr_dist > max_feature_dist[ID]){
max_feature_dist[ID]<- curr_dist #Update distance
}
}
rm(ID, curr_vals, curr_dist)
if(sf){
max_feature_dist<- data.frame(ID= unique_IDs, MaxDist= max_feature_dist)
st_geometry(max_feature_dist)<- st_geometry(voronoi_polygons)
}
return(max_feature_dist)
}
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol<- c(vol, terrain(vol, "slope")) #add second layer
y<- est_fsd(vol, dist_fun = "euclidean", avg_fun = "median", step = 10)
plot(y[,"MaxDist"])
Created on 2024-03-26 with reprex v2.0.2
@Nowosad I'll get started on the function for evaluating the final result which involves getting the distance between each node and the polygon center in geographic space. While starting on that I noticed that the x, and y column of a supercells object doesn't match the result of st_centroid
. Do you know why that is the case even when data has equal x/y pixel resolution?
library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
library(supercells)
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol<- c(vol, terrain(vol, "slope")) #add second layer
out<- supercells(vol, compactness = 1, step = 10)
centers1<- data.frame(X=out$x, Y=out$y)
centers2<- as.data.frame(st_coordinates(st_centroid(out)))
#> Warning: st_centroid assumes attributes are constant over geometries
print(centers1-centers2)
#> X Y
#> 1 -8.7058824 5.4117647
#> 2 -7.6428571 1.3571429
#> 3 -3.0681818 2.8409091
#> 4 -2.3170732 4.7560976
#> 5 -0.6896552 2.9885057
#> 6 -2.1052632 6.0526316
#> 7 -5.6962025 4.4303797
#> 8 -2.5454545 9.6363636
#> 9 -8.6440678 3.3898305
#> 10 -4.8809524 0.1190476
#> 11 -1.5151515 6.3636364
#> 12 -1.4772727 4.2045455
#> 13 -1.3793103 1.2643678
#> 14 -8.5714286 8.9285714
#> 15 -6.5116279 3.9534884
#> 16 -8.5714286 2.8571429
#> 17 -4.5588235 5.0000000
#> 18 -9.4736842 8.4210526
#> 19 -5.3731343 0.4477612
#> 20 -0.4819277 9.8795181
#> 21 -2.0661157 2.6446281
#> 22 -2.3437500 9.2187500
#> 23 -9.3181818 7.2727273
#> 24 -9.8148148 1.1111111
#> 25 -2.4657534 5.6164384
#> 26 -0.4065041 6.2601626
#> 27 -5.7272727 2.3636364
#> 28 -0.8571429 7.8571429
#> 29 -8.8461538 1.7307692
#> 30 -5.7000000 8.7000000
#> 31 -1.2195122 0.4878049
#> 32 -0.5714286 3.1428571
#> 33 -8.3561644 3.8356164
#> 34 -8.2758621 1.7241379
#> 35 -9.1764706 6.5882353
#> 36 -5.8653846 4.4230769
#> 37 -9.6103896 6.1038961
#> 38 -2.9411765 0.2941176
#> 39 -9.7674419 7.9069767
#> 40 -8.9024390 6.9512195
#> 41 -5.4285714 7.4285714
#> 42 -8.3333333 2.2222222
#> 43 -2.9702970 3.3663366
#> 44 -3.2038835 7.2815534
#> 45 -8.4285714 5.4285714
#> 46 -3.3913043 2.0869565
#> 47 -0.7920792 1.2871287
#> 48 -1.0169492 4.9152542
#> 49 -4.5535714 3.0357143
#> 50 -3.7391304 2.2608696
#> 51 -9.3150685 2.1917808
#> 52 -2.9090909 6.2727273
#> 53 -2.6984127 3.6507937
#> 54 -5.8536585 4.8780488
#> 55 -8.9908257 3.0275229
#> 56 -3.6619718 6.0563380
#> 57 -9.6153846 0.2564103
#> 58 -1.3385827 2.5196850
#> 59 -5.5483871 1.7419355
Created on 2024-03-26 with reprex v2.0.2
@ailich
How can I call your distance functions so I can add those in?
What do you mean by "your distance functions"?
Regarding the second issue -- I will try to investigate this...
@Nowosad by your distance functions. I meant the ones implemented from outside philentropy
. In the documentation it specifically mentions "euclidean", "jsd", "dtw" (dynamic time warping) and they seem to be implemented here. Can I call the get_vals_dist
function from R?
@ailich thanks for letting me know about the centroids issue. See now:
library(terra)
#> terra 1.7.73
library(sf)
#> Linking to GEOS 3.12.1, GDAL 3.7.3, PROJ 9.2.1; sf_use_s2() is TRUE
devtools::load_all()
#> ℹ Loading supercells
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol<- c(vol, terrain(vol, "slope")) #add second layer
out<- supercells(vol, compactness = 1, step = 57, iter = 1, clean = FALSE)
centers1<- data.frame(X=out$x, Y=out$y)
centers2<- as.data.frame(st_coordinates(st_centroid(out)))
#> Warning: st_centroid assumes attributes are constant over geometries
centers1
#> X Y
#> 1 2667710 6479203
#> 2 2667691 6478974
centers2
#> X Y
#> 1 2667710 6479203
#> 2 2667691 6478974
In short, the cpp code is using integers to represent supercells centers, thus rounding the intermediate values. When I switch the code to use doubles -- it is consistent with sf. See https://github.com/Nowosad/supercells/pull/37.
Regarding your other question -- I think that the solution would be to export some C++ code (as now it is invisible to R). I am going to visit my family for a few days, so I will try to do it next week -- I hope that is fine with you.
Sounds good. Thanks for the quick fix on the centroids, and enjoy spending time with your family!
Hi @ailich, see the new branch -- https://github.com/Nowosad/supercells/pull/38:
devtools::load_all()
#> ℹ Loading supercells
x = 1.1:10.1
y = 2.1:11.1
dist_fun = function(){}
supercells:::get_vals_dist_cpp11(x, y, "euclidean", dist_fun)
#> [1] 3.162278
supercells:::get_vals_dist_cpp11(x, y, "jsd", dist_fun)
#> [1] 0.4140878
supercells:::get_vals_dist_cpp11(x, y, "dtw", dist_fun)
#> [1] 10
supercells:::get_vals_dist_cpp11(x, y, "dtw2d", dist_fun)
#> [1] 2.828427
supercells:::get_vals_dist_cpp11(x, y, "jensen-shannon", dist_fun) #philentropy
#> [1] 0.4140878
supercells:::get_vals_dist_cpp11(x, y, "", dist_fun = function(x, y){sum(c(x, y))})
#> [1] 122
@Nowosad, I have to still add the supercells:::get_vals_dist_cpp11
but here's what I have so far.
library(terra)
#> Warning: package 'terra' was built under R version 4.3.2
#> terra 1.7.71
library(supercells)
library(sf)
#> Linking to GEOS 3.11.1, GDAL 3.6.2, PROJ 9.1.1; sf_use_s2() is TRUE
library(tidyverse)
library(philentropy)
#> Warning: package 'philentropy' was built under R version 4.3.3
#>
#> Attaching package: 'philentropy'
#> The following object is masked from 'package:terra':
#>
#> distance
# Create a sample data frame with points
st_voronoi_point <- function(points){
## points must be POINT geometry
# check for point geometry and execute if true
if(!all(st_geometry_type(points) == "POINT")){
stop("Input not POINT geometries")
}
g = st_combine(st_geometry(points)) # make multipoint
v = st_voronoi(g)
v = st_collection_extract(v)
return(v[unlist(st_intersects(points, v))])
} # https://gis.stackexchange.com/questions/362134/i-want-to-create-a-voronoi-diagram-while-retaining-the-data-in-the-data-frame
est_fsd<- function(x, k, dist_fun="euclidean", avg_fun="mean", step, sf=TRUE){
bbox<- st_as_sf(vect(ext(x))) #extent
initial_centers<- supercells(x, k=k,step=step, compactness = 1, iter=0) #compactness does not affect initial centers
voronoi_polygons <- st_voronoi_point(initial_centers) #create polygons around initial centers
voronoi_polygons<- st_crop(voronoi_polygons, bbox)
vals<- terra::extract(x, vect(voronoi_polygons)) # Extract values within each polygon
unique_IDs<- unique(vals$ID)
feature_centroids<- as.data.frame(matrix(data=NA_real_, nrow = length(unique_IDs), ncol = ncol(vals)))
names(feature_centroids)<- names(vals)
feature_centroids$ID<- unique_IDs
for (i in 1:length(unique_IDs)) {
curr_vals<- vals[vals$ID==unique_IDs[i],] #subset rows to polygon ID
feature_centroids[i,-1]<- sapply(curr_vals[,-1], avg_fun, na.rm=TRUE)
}
rm(curr_vals,i)
max_feature_dist<- rep(NA_real_, length(voronoi_polygons)) # maximum distance in feature space within a given polygon
for (i in 1:nrow(vals)) {
ID<- vals$ID[i]
curr_vals<- vals[i,-1]
if(any(is.na(curr_vals))){next()}
curr_dist<- philentropy::distance(rbind(curr_vals, feature_centroids[ID,-1]), method = dist_fun, mute.message = TRUE) #distance from centroid in feature space
if(is.na(max_feature_dist[ID]) | curr_dist > max_feature_dist[ID]){
max_feature_dist[ID]<- curr_dist #Update distance
}
}
rm(ID, curr_vals, curr_dist)
if(sf){
max_feature_dist<- data.frame(ID= unique_IDs, MaxDist= max_feature_dist)
st_geometry(max_feature_dist)<- st_geometry(voronoi_polygons)
}
return(max_feature_dist)
}
eval_compactness<- function(x, y, dist_fun, avg_fun){
vals<- terra::extract(x, vect(y)) # Extract values within each polygon
unique_IDs<- unique(vals$ID)
feature_centroids<- as.data.frame(matrix(data=NA_real_, nrow = length(unique_IDs), ncol = ncol(vals)))
names(feature_centroids)<- names(vals)
feature_centroids$ID<- unique_IDs
for (i in 1:length(unique_IDs)) {
curr_vals<- vals[vals$ID==unique_IDs[i],] #subset rows to polygon ID
feature_centroids[i,-1]<- sapply(curr_vals[,-1], avg_fun, na.rm=TRUE)
}
rm(curr_vals,i)
max_feature_dist<- rep(NA_real_, nrow(y)) # maximum distance in feature space within a given polygon
for (i in 1:nrow(vals)) {
ID<- vals$ID[i]
curr_vals<- vals[i,-1]
if(any(is.na(curr_vals))){next()}
curr_dist<- philentropy::distance(rbind(curr_vals, feature_centroids[ID,-1]), method = dist_fun, mute.message = TRUE) #distance from centroid in feature space
if(is.na(max_feature_dist[ID]) | curr_dist > max_feature_dist[ID]){
max_feature_dist[ID]<- curr_dist #Update distance
}
}
rm(ID, curr_vals, curr_dist)
max_poly_dist<- rep(NA_real_, nrow(y))
for (i in 1:nrow(y)) {
polygon_nodes<- st_coordinates(st_geometry(y[i,]))[,1:2]
polygon_nodes<- polygon_nodes[-nrow(polygon_nodes),]
for (j in 1:nrow(polygon_nodes)) {
curr_dist<- sqrt((polygon_nodes[j,1]-y$x[i])^2 + (polygon_nodes[j,2]-y$y[i])^2)
if(is.na(max_poly_dist[i]) | curr_dist > max_poly_dist[i]){
max_poly_dist[i]<- curr_dist #Update distance
}
}
}
y$MaxFeatDist<- max_feature_dist
y$MaxPolyDist<- max_poly_dist/mean(res(x))
return(y)
}
# Load data
vol = terra::rast(system.file("raster/volcano.tif", package = "supercells"))
vol<- c(vol, terrain(vol, "slope")) #add second layer
step<-10
Use the est_fsd
function to estimate an initial compactness value that balances between geographic and feature space based on your specific data and distance measure. We will also try a lower and higher value to see how this changes results. We can then use eval_compactness
on the final result. If there is a large difference between the chosen compactness and the average maximum feature distance across clusters then that may indicate that you should change the compactness value.
compactness_medium<- est_fsd(vol, step=step, dist_fun = "euclidean", avg_fun = "median", sf = FALSE) |> median()
compactness_medium
#> [1] 20.42507
compactness_high<- compactness_medium*100 #Increase importance of geographic space and decrease importance of feature space
compactness_high
#> [1] 2042.507
compactness_low<- compactness_medium/100 #Decrease importance of geographic space and increase importance of feature space
compactness_low
#> [1] 0.2042507
High compactness weights geographic space highly and feature space lower.
supercells_high<- supercells(vol, step=step, compactness = compactness_high, iter=10,clean=FALSE)
plot(st_geometry(supercells_high))
high_eval<- eval_compactness(vol, supercells_high, dist_fun = "euclidean", avg_fun = "median")
median(high_eval$MaxFeatDist/compactness_high)
#> [1] 0.009430206
median(high_eval$MaxPolyDist/step)
#> [1] 0.7106335
Medium compactness balances weighting between geographic and feature space.
supercells_medium<- supercells(vol, step=step, compactness = compactness_medium, iter=10,clean=FALSE)
plot(st_geometry(supercells_medium))
medium_eval<- eval_compactness(vol, supercells_medium, dist_fun = "euclidean", avg_fun = "median")
median(medium_eval$MaxFeatDist/compactness_medium)
#> [1] 0.7645443
median(medium_eval$MaxPolyDist/step)
#> [1] 0.8860023
Low compactness weights geographic space lower and feature space higher.
supercells_low<- supercells(vol, step=step, compactness = compactness_low, iter=10,clean=FALSE)
plot(st_geometry(supercells_low))
low_eval<- eval_compactness(vol, supercells_low, dist_fun = "euclidean", avg_fun = "median")
median(low_eval$MaxFeatDist/compactness_low)
#> [1] 55.70342
median(low_eval$MaxPolyDist/step)
#> [1] 1.146726
Created on 2024-04-05 with reprex v2.0.2
Hi @ailich,
see https://github.com/Nowosad/supercells/tree/estimate_compactness.
I took your code, cleaned it a bit, decided to return all the values (and not just summaries), and added an R function
run_ce()
-- try its examples.It seems to work fine.
That being said, I started thinking about the whole process, and have an idea:
supercells()
calledmeta_dists
(better name needed!), and if this argument is TRUE, then three new columns are added to the final supercells:value_distance
,spatial_distance
,total_distance
.meta_dists = TRUE
and only for the last iteration.meta_dists = TRUE
anditer = 1
; however, if you keep the defaultiter = 10
and setmeta_dists = TRUE
then your resulting supercells will have three new columns@ailich, what do you think about this idea?